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ETAPS Foreword 


Welcome to the proceedings of ETAPS 2018! After a somewhat coldish ETAPS 2017 
in Uppsala in the north, ETAPS this year took place in Thessaloniki, Greece. I am 
happy to announce that this is the first ETAPS with gold open access proceedings. This 
means that all papers are accessible by anyone for free. 

ETAPS 2018 was the 21st instance of the European Joint Conferences on Theory 
and Practice of Software. ETAPS is an annual federated conference established in 
1998, and consists of five conferences: ESOP, FASE, FoSSaCS, TACAS, and POST. 
Each conference has its own Program Committee (PC) and its own Steering Com- 
mittee. The conferences cover various aspects of software systems, ranging from 
theoretical computer science to foundations to programming language developments, 
analysis tools, formal approaches to software engineering, and security. Organizing 
these conferences in a coherent, highly synchronized conference program facilitates 
participation in an exciting event, offering attendees the possibility to meet many 
researchers working in different directions in the field, and to easily attend talks of 
different conferences. Before and after the main conference, numerous satellite work- 
shops take place and attract many researchers from all over the globe. 

ETAPS 2018 received 479 submissions in total, 144 of which were accepted, 
yielding an overall acceptance rate of 30%. I thank all the authors for their interest in 
ETAPS, all the reviewers for their peer reviewing efforts, the PC members for their 
contributions, and in particular the PC (co-)chairs for their hard work in running this 
entire intensive process. Last but not least, my congratulations to all authors of the 
accepted papers! 

ETAPS 2018 was enriched by the unifying invited speaker Martin Abadi (Google 
Brain, USA) and the conference-specific invited speakers (FASE) Pamela Zave (AT & 
T Labs, USA), (POST) Benjamin C. Pierce (University of Pennsylvania, USA), and 
(ESOP) Derek Dreyer (Max Planck Institute for Software Systems, Germany). Invited 
tutorials were provided by Armin Biere (Johannes Kepler University, Linz, Austria) on 
modern SAT solving and Fabio Somenzi (University of Colorado, Boulder, USA) on 
hardware verification. My sincere thanks to all these speakers for their inspiring and 
interesting talks! 

ETAPS 2018 took place in Thessaloniki, Greece, and was organised by the 
Department of Informatics of the Aristotle University of Thessaloniki. The university 
was founded in 1925 and currently has around 75,000 students; it is the largest uni- 
versity in Greece. ETAPS 2018 was further supported by the following associations 
and societies: ETAPS e.V., EATCS (European Association for Theoretical Computer 
Science), EAPLS (European Association for Programming Languages and Systems), 
and EASST (European Association of Software Science and Technology). The local 
organization team consisted of Panagiotis Katsaros (general chair), Ioannis Stamelos, 


VI ETAPS Foreword 


Lefteris Angelis, George Rahonis, Nick Bassiliades, Alexander Chatzigeorgiou, Ezio 
Bartocci, Simon Bliudze, Emmanouela Stachtiari, Kyriakos Georgiadis, and Petros 
Stratis (EasyConferences). 

The overall planning for ETAPS is the main responsibility of the Steering Com- 
mittee, and in particular of its Executive Board. The ETAPS Steering Committee 
consists of an Executive Board and representatives of the individual ETAPS confer- 
ences, as well as representatives of EATCS, EAPLS, and EASST. The Executive 
Board consists of Gilles Barthe (Madrid), Holger Hermanns (Saarbriicken), Joost-Pieter 
Katoen (chair, Aachen and Twente), Gerald Liittgen (Bamberg), Vladimiro Sassone 
(Southampton), Tarmo Uustalu (Tallinn), and Lenore Zuck (Chicago). Other members 
of the Steering Committee are: Wil van der Aalst (Aachen), Parosh Abdulla (Uppsala), 
Amal Ahmed (Boston), Christel Baier (Dresden), Lujo Bauer (Pittsburgh), Dirk Beyer 
(Munich), Mikolaj Bojanczyk (Warsaw), Luis Caires (Lisbon), Jurriaan Hage 
(Utrecht), Rainer Hahnle (Darmstadt), Reiko Heckel (Leicester), Marieke Huisman 
(Twente), Panagiotis Katsaros (Thessaloniki), Ralf Kiisters (Stuttgart), Ugo Dal Lago 
(Bologna), Kim G. Larsen (Aalborg), Matteo Maffei (Vienna), Tiziana Margaria 
(Limerick), Flemming Nielson (Copenhagen), Catuscia Palamidessi (Palaiseau), 
Andrew M. Pitts (Cambridge), Alessandra Russo (London), Dave Sands (Göteborg), 
Don Sannella (Edinburgh), Andy Schiirr (Darmstadt), Alex Simpson (Ljubljana), 
Gabriele Taentzer (Marburg), Peter Thiemann (Freiburg), Jan Vitek (Prague), Tomas 
Vojnar (Brno), and Lijun Zhang (Beijing). 

I would like to take this opportunity to thank all speakers, attendees, organizers 
of the satellite workshops, and Springer for their support. I hope you all enjoy the 
proceedings of ETAPS 2018. Finally, a big thanks to Panagiotis and his local orga- 
nization team for all their enormous efforts that led to a fantastic ETAPS in 
Thessaloniki! 


February 2018 Joost-Pieter Katoen 


Preface 


This volume contains the papers presented at the 27th European Symposium on Pro- 
gramming (ESOP 2018) held April 16-19, 2018, in Thessaloniki, Greece. ESOP is one 
of the European Joint Conferences on Theory and Practice of Software (ETAPS). It is 
devoted to fundamental issues in the specification, design, analysis, and implementa- 
tion of programming languages and systems. 

The 36 papers in this volume were selected from 114 submissions based on origi- 
nality and quality. Each submission was reviewed by three to six Program Committee 
(PC) members and external reviewers, with an average of 3.3 reviews per paper. 
Authors were given a chance to respond to these reviews during the rebuttal period 
from December 6 to 8, 2017. All submissions, reviews, and author responses were 
considered during the online discussion, which identified 74 submissions to be dis- 
cussed further at the physical PC meeting held at Inria Paris, December 13-14, 2017. 
Each paper was assigned a guardian, who was responsible for making sure that external 
reviews were solicited if there was not enough non-conflicted expertise among the PC, 
and for presenting a summary of the reviews and author responses at the PC meeting. 
All non-conflicted PC members participated in the discussion of a paper’s merits. PC 
members wrote reactions to author responses, including summaries of online discus- 
sions and discussions during the physical PC meeting, so as to help the authors 
understand decisions. Papers co-authored by members of the PC were held to a higher 
standard and discussed toward the end of the physical PC meeting. There were ten such 
submissions and five were accepted. Papers for which the program chair had a conflict 
of interest were kindly handled by Fritz Henglein. 

My sincere thanks to all who contributed to the success of the conference. This 
includes the authors who submitted papers for consideration; the external reviewers, 
who provided timely expert reviews, sometimes on short notice; and the PC, who 
worked hard to provide extensive reviews, engaged in high-quality discussions about 
the submissions, and added detailed comments to help authors understand the PC 
discussion and decisions. I am grateful to the past ESOP PC chairs, particularly Jan 
Vitek and Hongseok Yang, and to the ESOP SC chairs, Giuseppe Castagna and Peter 
Thiemann, who helped with numerous procedural matters. I would like to thank the 
ETAPS SC chair, Joost-Pieter Katoen, for his amazing work and his responsiveness. 
HotCRP was used to handle submissions and online discussion, and helped smoothly 
run the physical PC meeting. Finally, I would like to thank Catalin Hritcu for spon- 
soring the physical PC meeting through ERC grant SECOMP, Mathieu Mourey and the 
Inria Paris staff for their help organizing the meeting, and William Bowman for 
assisting with the PC meeting. 
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RustBelt: Logical Foundations for the Future 
of Safe Systems Programming 


Derek Dreyer 


Max Planck Institute for Software Systems (MPI-SWS), Germany 
dreyer@mpi-sws.org 


Abstract. Rust is a new systems programming language, developed at Mozilla, 
that promises to overcome the seemingly fundamental tradeoff in language 
design between high-level safety guarantees and low-level control over resource 
management. Unfortunately, none of Rust’s safety claims have been formally 
proven, and there is good reason to question whether they actually hold. 
Specifically, Rust employs a strong, ownership-based type system, but then 
extends the expressive power of this core type system through libraries that 
internally use unsafe features. 

In this talk, I will present RustBelt (http://plv.mpi-sws.org/rustbelt), the first 
formal (and machine-checked) safety proof for a language representing a real- 
istic subset of Rust. Our proof is extensible in the sense that, for each new Rust 
library that uses unsafe features, we can say what verification condition it must 
satisfy in order for it to be deemed a safe extension to the language. We have 
carried out this verification for some of the most important libraries that are used 
throughout the Rust ecosystem. 

After reviewing some essential features of the Rust language, I will describe 
the high-level structure of the RustBelt verification and then delve into detail 
about the secret weapon that makes RustBelt possible: the Iris framework for 
higher-order concurrent separation logic in Coq (http://iris-project.org). I will 
explain by example how Iris generalizes the expressive power of O’Hearn’s 
original concurrent separation logic in ways that are essential for verifying the 
safety of Rust libraries. I will not assume any prior familiarity with concurrent 
separation logic or Rust. 

This is joint work with Ralf Jung, Jacques-Henri Jourdan, Robbert Krebbers, 
and the rest of the Iris team. 
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Ningning Xie), Xuan Bi, and Bruno C. d. S. Oliveira 
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{nnxie,xbi,bruno}@cs.hku.hk 


Abstract. Consistent subtyping is employed in some gradual type sys- 
tems to validate type conversions. The original definition by Siek and 
Taha serves as a guideline for designing gradual type systems with 
subtyping. Polymorphic types a la System F also induce a subtyping 
relation that relates polymorphic types to their instantiations. However 
Siek and Taha’s definition is not adequate for polymorphic subtyping. 
The first goal of this paper is to propose a generalization of consistent 
subtyping that is adequate for polymorphic subtyping, and subsumes 
the original definition by Siek and Taha. The new definition of consis- 
tent subtyping provides novel insights with respect to previous polymor- 
phic gradual type systems, which did not employ consistent subtyping. 
The second goal of this paper is to present a gradually typed calcu- 
lus for implicit (higher-rank) polymorphism that uses our new notion 
of consistent subtyping. We develop both declarative and (bidirectional) 
algorithmic versions for the type system. We prove that the new calculus 
satisfies all static aspects of the refined criteria for gradual typing, which 
are mechanically formalized using the Coq proof assistant. 


1 Introduction 


Gradual typing [21] is an increasingly popular topic in both programming 
language practice and theory. On the practical side there is a growing num- 
ber of programming languages adopting gradual typing. Those languages include 
Clojure [6], Python [27], TypeScript [5], Hack [26], and the addition of Dynamic to 
C# [4], to cite a few. On the theoretical side, recent years have seen a large body of 
research that defines the foundations of gradual typing [8,9,13], explores their use 
for both functional and object-oriented programming [21,22], as well as its appli- 
cations to many other areas [3,24]. 

A key concept in gradual type systems is consistency [21]. Consistency weak- 
ens type equality to allow for the presence of unknown types. In some gradual 
type systems with subtyping, consistency is combined with subtyping to give 
rise to the notion of consistent subtyping [22]. Consistent subtyping is employed 
by gradual type systems to validate type conversions arising from conventional 
subtyping. One nice feature of consistent subtyping is that it is derivable from 
the more primitive notions of consistency and subtyping. As Siek and Taha [22] 
put it this shows that “gradual typing and subtyping are orthogonal and can be 
combined in a principled fashion”. Thus consistent subtyping is often used as a 
guideline for designing gradual type systems with subtyping. 
© The Author(s) 2018 
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Unfortunately, as noted by Garcia et al. [13], notions of consistency and/or 
consistent subtyping “become more difficult to adapt as type systems get more 
complez”. In particular, for the case of type systems with subtyping, certain 
kinds of subtyping do not fit well with the original definition of consistent sub- 
typing by Siek and Taha [22]. One important case where such mismatch happens 
is in type systems supporting implicit (higher-rank) polymorphism [11,18]. It is 
well-known that polymorphic types a la System F induce a subtyping relation 
that relates polymorphic types to their instantiations [16,17]. However Siek and 
Taha’s [22] definition is not adequate for this kind of subtyping. Moreover the 
current framework for Abstracting Gradual Typing (AGT) [13] also does not 
account for polymorphism, with the authors acknowledging that this is one of 
the interesting avenues for future work. 

Existing work on gradual type systems with polymorphism does not use 
consistent subtyping. The Polymorphic Blame Calculus (AB) [1] is an explic- 
itly polymorphic calculus with explicit casts, which is often used as a target 
language for gradual type systems with polymorphism. In AB a notion of com- 
patibility is employed to validate conversions allowed by casts. Interestingly AB 
allows conversions from polymorphic types to their instantiations. For exam- 
ple, it is possible to cast a value with type Va.a — a into Int — Int. Thus 
an important remark here is that while AB is explicitly polymorphic, casting 
and conversions are closer to implicit polymorphism. That is, in a conventional 
explicitly polymorphic calculus (such as System F), the primary notion is type 
equality, where instantiation is not taken into account. Thus the types Va.a —> a 
and Int — Int are deemed incompatible. However in implicitly polymorphic cal- 
culi [11,18] Va.a — a and Int — Int are deemed compatible, since the latter type 
is an instantiation of the former. Therefore AB is in a sense a hybrid between 
implicit and explicit polymorphism, utilizing type equality (a la System F) for 
validating applications, and compatibility for validating casts. 

An alternative approach to polymorphism has recently been proposed by 
Igarashi et al. [14]. Like AB their calculus is explicitly polymorphic. However, 
in that work they employ type consistency to validate cast conversions, and 
forbid conversions from Va.a — a to Int — Int. This makes their casts closer 
to explicit polymorphism, in contrast to AB. Nonetheless, there is still same 
flavour of implicit polymorphism in their calculus when it comes to interactions 
between dynamically typed and polymorphically typed code. For example, in 
their calculus type consistency allows types such as Va.a — Int to be related to 
x — Int, where some sort of (implicit) polymorphic subtyping is involved. 

The first goal of this paper is to study the gradually typed subtyping and con- 
sistent subtyping relations for predicative implicit polymorphism. To accomplish 
this, we first show how to reconcile consistent subtyping with polymorphism 
by generalizing the original consistent subtyping definition by Siek and Taha 
[22]. The new definition of consistent subtyping can deal with polymorphism, 
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and preserves the orthogonality between consistency and subtyping. To slightly 
rephrase Siek and Taha [22], the motto of our paper is that: 


Gradual typing and polymorphism are orthogonal and can be combined 
in a principled fashion.! 


With the insights gained from our work, we argue that, for implicit polymor- 
phism, Ahmed et al.’s [1] notion of compatibility is too permissive (i.e. too many 
programs are allowed to type-check), and that Igarashi et al.’s [14] notion of type 
consistency is too conservative. As a step towards an algorithmic version of con- 
sistent subtyping, we present a syntax-directed version of consistent subtyping 
that is sound and complete with respect to our formal definition of consistent 
subtyping. The syntax-directed version of consistent subtyping is remarkably 
simple and well-behaved, without the ad-hoc restriction operator [22]. More- 
over, to further illustrate the generality of our consistent subtyping definition, 
we show that it can also account for top types, which cannot be dealt with by 
Siek and Taha’s [22] definition either. 

The second goal of this paper is to present a (source-level) gradually typed 
calculus for (predicative) implicit higher-rank polymorphism that uses our new 
notion of consistent subtyping. As far as we are aware, there is no work on 
bridging the gap between implicit higher-rank polymorphism and gradual typing, 
which is interesting for two reasons. On one hand, modern functional languages 
(such as Haskell) employ sophisticated type-inference algorithms that, aided by 
type annotations, can deal with implicit higher-rank polymorphism. So a natural 
question is how gradual typing can be integrated in such languages. On the other 
hand, there is several existing work on integrating explicit polymorphism into 
gradual typing [1,14]. Yet no work investigates how to move such expressive 
power into a source language with implicit polymorphism. Therefore as a step 
towards gradualizing such type systems, this paper develops both declarative 
and algorithmic versions for a gradual type system with implicit higher-rank 
polymorphism. The new calculus brings the expressive power of full implicit 
higher-rank polymorphic into a gradually typed source language. We prove that 
our calculus satisfies all of the static aspects of the refined criteria for gradual 
typing [25], while discussing some issues related with the dynamic guarantee. 

In summary, the contributions of this paper are: 


— We define a framework for consistent subtyping with: 
e anew definition of consistent subtyping that subsumes and generalizes that 
of Siek and Taha [22], and can deal with polymorphism and top types. 
e a syntax-directed version of consistent subtyping that is sound and com- 
plete with respect to our definition of consistent subtyping, but still 
guesses polymorphic instantiations. 


1 Note here that we borrow Siek and Taha’s [22] motto mostly to talk about the 
static semantics. As Ahmed et al. [1] show there are several non-trivial interactions 
between polymorphism and casts at the level of the dynamic semantics. 
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A<:B 
Int <: Int Bool <: Bool Float <: Float Int <: Float 
By <: Ai Ag <: Bo , ‘ 
li : Atel ntm : L : Atelen <: 
Aa 2 [les A; | <: [lis A] * <:i x 
A~n B 
An B A~ B Ai ~ Bi 
AxA Äng x~ A —— aie 
A, — Ao ~ Bi — B2 [l : Ai] ~ [Li : Bi] 


. r r š ? 
Fig. 1. Subtyping and type consistency in FOb:, 


— Based on consistent subtyping, we present a declarative gradual type system 
with predicative implicit higher-rank polymorphism. We prove that our cal- 
culus satisfies the static aspects of the refined criteria for gradual typing [25], 
and is type-safe by a type-directed translation to AB, and thus hereditarily 
preserves parametricity [2]. 

— We present a complete and sound bidirectional algorithm for implementing 
the declarative system based on the design principle of Garcia and Cimini 
[12] and the approach of Dunfield and Krishnaswami [11]. 

— All of the metatheory of this paper, except some manual proofs for the algo- 
rithmic type system, has been mechanically formalized in Coq?. 


2 Background and Motivation 


In this section we review a simple gradually typed language with objects [22], 
to introduce the concept of consistency subtyping. We also briefly talk about 
the Odersky-Laufer type system for higher-rank types [17], which serves as the 
original language on which our gradually typed calculus with implicit higher- 
rank polymorphism is based. 


2.1 Gradual Subtyping 


Siek and Taha [22] developed a gradual typed system for object-oriented lan- 
guages that they call FOb... Central to gradual typing is the concept of con- 
sistency (written ~) between gradual types, which are types that may involve 
the unknown type x. The intuition is that consistency relaxes the structure of a 
type system to tolerate unknown positions in a gradual type. They also defined 
the subtyping relation in a way that static type safety is preserved. Their key 


? All supplementary materials are available at https://bitbucket.org/xieningning/ 
consistent-subtyping. 
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insight is that the unknown type x is neutral to subtyping, with only * <: x. 
Both relations are found in Fig. 1. 

A primary contribution of their work is to show that consistency and subtyp- 
ing are orthogonal. To compose subtyping and consistency, Siek and Taha [22] 
defined consistent subtyping (written S) in two equivalent ways: 


Definition 1 (Consistent Subtyping à la Siek and Taha [22]) 


- ASB if and only if A~ C and C <: B for some C. 
- ASB if and only if A <: C and C ~ B for some C. 


Both definitions are non-deterministic because of the intermediate type C. To 
remove non-determinism, they proposed a so-called restriction operator, written 
A|p that masks off the parts of a type A that are unknown in a type B. 


Alp =case A,B of |(—,x) =x 
| Aı — Ao, Bı > By = Aı|B, > 42| Bə 
| la: A1, -ln : An], [l : B1, -lm : Bm] if n < m > [h : Ailp,,.. ln: Anis, | 
| (la: Ai, -ln : An], [l : B1, -lm : Bm] if n >m => 
[la : AilBi, -lm : Am| Bm; -ln : An] 
| otherwise > A 


With the restriction operator, consistent subtyping is simply defined as A S B= 
Alp <: B|a. Then they proved that this definition is equivalent to Definition 1. 


2.2 The Odersky-Laufer Type System 


The calculus we are combining gradual typing with is the well-established pred- 
icative type system for higher-rank types proposed by Odersky and Laufer [17]. 
One difference is that, for simplicity, we do not account for a let expression, 
as there is already existing work about gradual type systems with let expres- 
sions and let generalization (for example, see Garcia and Cimini [12]). Similar 
techniques can be applied to our calculus to enable let generalization. 

The syntax of the type system, along with the typing and subtyping judg- 
ments is given in Fig.2. An implicit assumption throughout the paper is that 
variables in contexts are distinct. We save the explanations for the static seman- 
tics to Sect. 4, where we present our gradually typed version of the calculus. 


2.3 Motivation: Gradually Typed Higher-Rank Polymorphism 


Our work combines implicit (higher-rank) polymorphism with gradual typing. 
As is well known, a gradually typed language supports both fully static and fully 
dynamic checking of program properties, as well as the continuum between these 
two extremes. It also offers programmers fine-grained control over the static-to- 
dynamic spectrum, i.e., a program can be evolved by introducing more or less 
precise types as needed [13]. 
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Expressions e::=a|n|dAv:A.e|dAvelee 
Types A,B := Int | a | A > B | Va.A 
Monotypes 7,0 ::= Int |a| T —> 0 
Contexts Y ::=Ø|Y,xz:A|Y,a 
YH e:A 
r:AEw Wr: Ape: B 
—>7 VAR =a _ Nat LAMANN 
YH’ g:A WEH” n: Int WH’ Az: Ae: AB 
Y HO! e : Ay > Ao y HÊ? es : Ay yH! e: Ay Wt A, <: Ag 
OL APP OL SuB 
WE” ei e2: Ao WE’ e: Ao 
Wie: the: B Date: A 
LAM — yan GEN 
WE Xr e:7T 3B Wr e: Ya.A 
WEA<:B 
acw Wer WE Aja = 7] <: B 
— CS-TVAR =— ~ CS-Int FORALLL 
Wra<:a We Int <: Int WEVaA<: B 
WatA<:B WE Bı <: Ai Wt Ao <: Bo 
——————_ FORALLR CS-FUN 
WEA<:Va.B Wt A, — Ao <: Bi — Bo 


Fig. 2. Syntax and static semantics of the Odersky-Laufer type system. 


Haskell is a language that supports implicit higher-rank polymorphism, but 
no gradual typing. Therefore some programs that are safe at run-time may be 
rejected due to the conservativity of the type system. For example, consider the 
following Haskell program adapted from Jones et al. [18]: 


foo :: ({Int], [Char]) 
foo = let fx = (zx |1, 2], x ['a’, ’b’]) in f reverse 
This program is rejected by Haskell’s type checker because Haskell imple- 
ments the Damas-Milner rule that a lambda-bound argument (such as z) can only 
have a monotype, i.e., the type checker can only assign z the type [Int] — [Int], 
or [Char] — [Char], but not Va.[a] — [a]. Finding such manual polymorphic 
annotations can be non-trivial. Instead of rejecting the program outright, due to 
missing type annotations, gradual typing provides a simple alternative by giving 
x the unknown type (denoted x). With such typing the same program type-checks 
and produces ({2, 1], ‘0’, a’]). By running the program, programmers can gain 
some additional insight about the run-time behaviour. Then, with such insight, 
they can also give x a more precise type (Va.[a] — [a]) a posteriori so that 
the program continues to type-check via implicit polymorphism and also grants 
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Types A,B ::= Int |a | A > B | Ya.A | x 
Monotypes 7,0 ::= Int | a | T —> o 
Contexts Vs=O|W,c:Al|Wa 


A~B 
Ain B A~ B A~n B 
AxA Anx xv A eee a 
Aj = A2 ~ Bı = B2 Va. A ~ Va.B 
WEFA<:B 
YaF A<: B Wer WE Ajla tr] <: B acw 
—_—— S-FORALLR S-FORALLL | ————— S-TVAR 
WEA<:Va.B WEVa.A<: B Wra<:a 
WEB, <: A WE A<: B2 

————_ S-INT S-FUN ———— S-UNKNOWN 
WE Int <: Int WE A, — A <: Bı — Bo WEx<ix 


Fig. 3. Syntax of types, consistency, and subtyping in the declarative system. 


more static safety. In this paper, we envision such a language that combines the 
benefits of both implicit higher-rank polymorphism and gradual typing. 


3 Revisiting Consistent Subtyping 


In this section we explore the design space of consistent subtyping. We start 
with the definitions of consistency and subtyping for polymorphic types, and 
compare with some relevant work. We then discuss the design decisions involved 
towards our new definition of consistent subtyping, and justify the new definition 
by demonstrating its equivalence with that of Siek and Taha [22] and the AGT 
approach [13] on simple types. 

The syntax of types is given at the top of Fig.3. We write A, B for types. 
Types are either the integer type Int, type variables a, functions types A —> B, 
universal quantification Va.A, or the unknown type x. Though we only have one 
base type Int, we also use Bool for the purpose of illustration. Note that mono- 
types 7 contain all types other than the universal quantifier and the unknown 
type x. We will discuss this restriction when we present the subtyping rules. 
Contexts W are ordered lists of type variable declarations and term variables. 


3.1 Consistency and Subtyping 


We start by giving the definitions of consistency and subtyping for polymorphic 
types, and comparing our definitions with the compatibility relation by Ahmed 
et al. [1] and type consistency by Igarashi et al. [14]. 
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Consistency. The key observation here is that consistency is mostly a structural 
relation, except that the unknown type x can be regarded as any type. Following 
this observation, we naturally extend the definition from Fig. 1 with polymorphic 
types, as shown at the middle of Fig. 3. In particular a polymorphic type Va.A 
is consistent with another polymorphic type Va.B if A is consistent with B. 


Subtyping. We express the fact that one type is a polymorphic generalization 
of another by means of the subtyping judgment Y + A <: B. Compared with 
the subtyping rules of Odersky and Laufer [17] in Fig.2, the only addition is 
the neutral subtyping of x. Notice that, in the rule S-FORALLL, the universal 
quantifier is only allowed to be instantiated with a monotype. The judgment 
W + 7 checks all the type variables in 7 are bound in the context W. For space 
reasons, we omit the definition. According to the syntax in Fig. 3, monotypes 
do not include the unknown type x. This is because if we were to allow the 
unknown type to be used for instantiation, we could have Va.a > a <: x > x 
by instantiating a with x. Since x» — x is consistent with any functions A —> B, 
for instance, Int — Bool, this means that we could provide an expression of 
type Va.a — a to a function where the input type is supposed to be Int — 
Bool. However, as we might expect, Va.a — a is definitely not compatible with 
Int — Bool. This does not hold in any polymorphic type systems without gradual 
typing. So the gradual type system should not accept it either. (This is the so- 
called conservative extension property that will be made precise in Sect. 4.3.) 
Importantly there is a subtle but crucial distinction between a type variable 
and the unknown type, although they all represent a kind of “arbitrary” type. 
The unknown type stands for the absence of type information: it could be any 
type at any instance. Therefore, the unknown type is consistent with any type, 
and additional type-checks have to be performed at runtime. On the other hand, 
a type variable indicates parametricity. In other words, a type variable can only 
be instantiated to a single type. For example, in the type Va.a — a, the two 
occurrences of a represent an arbitrary but single type (e.g., Int > Int, Bool > 
Bool), while x — x could be an arbitrary function (e.g., Int — Bool) at runtime. 


Comparison with Other Relations. In other polymorphic gradual calculi, consis- 
tency and subtyping are often mixed up to some extent. In AB [1], the compat- 
ibility relation for polymorphic types is defined as follows: 


A<B A[X + x] < B 
————_. Comp-ALLR ————_— Comp-ALLL 
A z VX.B OMP-ALL YXA x B OMP-ALL. 


Notice that, in rule CoMP-ALLL, the universal quantifier is always instantiated 
to x. However, this way, AB allows Va.a — a < Int — Bool, which as we discussed 
before might not be what we expect. Indeed AB relies on sophisticated runtime 
checks to rule out such instances of the compatibility relation a posteriori. 
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T (=r Int) lnt Int — Int ~ Int > x 
“| | | sf 
(Va.a — Int) — Int — (Ya.x — Int) — Int Va.a —————._ L 
(a) (b) 
1 = (((* = Int) = Int) + Bool) — (Int > x) 
<| <f 
(((Va.a — Int) — Int) + Bool) — (Va.a) = ole 


(c) 


Fig. 4. Examples that break the original definition of consistent subtyping. 


Igarashi et al. [14] introduced the so-called quasi-polymorphic types for types 
that may be used where a V-type is expected, which is important for their pur- 
pose of conservativity over System F. Their type consistency relation, involving 
polymorphism, is defined as follows’: 


A~B A~B B#AVa.B' x€ Types(B) 
Va.A~Va.B VaA~B 


Compared with our consistency definition in Fig. 3, their first rule is the same 
as ours. The second rule says that a non V-type can be consistent with a V-type 
only if it contains x. In this way, their type system is able to reject Va.a > a ~ 
Int — Bool. However, in order to keep conservativity, they also reject Va.a > a ~ 
Int — Int, which is perfectly sensible in their setting (i.e., explicit polymorphism). 
However with implicit polymorphism, we would expect Va.a — a to be related 
with Int — Int, since a can be instantiated to Int. 

Nonetheless, when it comes to interactions between dynamically typed and 
polymorphically typed terms, both relations allow Va.a — Int to be related with 
x — Int for example, which in our view, is some sort of (implicit) polymorphic 
subtyping combined with type consistency, and that should be derivable by the 
more primitive notions in the type system (instead of inventing new relations). 
One of our design principles is that subtyping and consistency is orthogonal, and 
can be naturally superimposed, echoing the same opinion of Siek and Taha [22]. 


3.2 Towards Consistent Subtyping 


With the definitions of consistency and subtyping, the question now is how to 
compose these two relations so that two types can be compared in a way that 
takes these two relations into account. 


3 This is a simplified version. 


12 N. Xie et al. 


Unfortunately, the original definition of Siek and Taha [22] (Definition 1) does 
not work well with our definitions of consistency and subtyping for polymorphic 
types. Consider two types: (Va.a — Int) — Int, and (x —> Int) — Int. The first 
type can only reach the second type in one way (first by applying consistency, 
then subtyping), but not the other way, as shown in Fig. 4a. We use L to mean 
that we cannot find such a type. Similarly, there are situations where the first 
type can only reach the second type by the other way (first applying subtyping, 
and then consistency), as shown in Fig. 4b. 

What is worse, if those two examples are composed in a way that those types 
all appear co-variantly, then the resulting types cannot reach each other in either 
way. For example, Fig. 4c shows such two types by putting a Bool type in the 
middle, and neither definition of consistent subtyping works. 


Observations on Consistent Subtyping Based on Information Propagation. In 
order to develop the correct definition of consistent subtyping for polymorphic 
types, we need to understand how consistent subtyping works. We first review 
two important properties of subtyping: (1) subtyping induces the subsumption 
rule: if A <: B, then an expression of type A can be used where B is expected; 
(2) subtyping is transitive: if A <: B, and B <: C, then A <: C. Though con- 
sistent subtyping takes the unknown type into consideration, the subsumption 
rule should also apply: if A < B, then an expression of type A can also be used 
where B is expected, given that there might be some information lost by con- 
sistency. A crucial difference from subtyping is that consistent subtyping is not 
transitive because information can only be lost once (otherwise, any two types 
are a consistent subtype of each other). Now consider a situation where we have 
both A <: B, and B < C, this means that A can be used where B is expected, 
and B can be used where C is expected, with possibly some loss of information. 
In other words, we should expect that A can be used where C is expected, since 
there is at most one-time loss of information. 


Observation 1. If A <: B, and BSC, then ASC. 
This is reflected in Fig. 5a. A symmetrical observation is given in Fig. 5b: 
Observation 2. [fC < B, and B <: A, then C <A. 


From the above observations, we see what the problem is with the original 
definition. In Fig. 5a, if B can reach C by Tı, then by subtyping transitivity, A 
can reach C by Tı. However, if B can only reach C by T>, then A cannot reach 
C through the original definition. A similar problem is shown in Fig. 5b. 

However, it turns out that those two problems can be fixed using the same 
strategy: instead of taking one-step subtyping and one-step consistency, our def- 
inition of consistent subtyping allows types to take one-step subtyping, one-step 
consistency, and one more step subtyping. Specifically, A <: B ~ T> <: C (in 
Fig. 5a) and C <: Tı ~ B <: A (in Fig.5b) have the same relation chain: 
subtyping, consistency, and subtyping. 
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Tı C =s _------> A 
L T `N P 
al a l i S <] 
#7 d I P 
B F Tə ws 7 Ti B 
<:| a se] P < 
Ae Cc Ta 


A2 As A, = (((Va.a — Int) — Int) — Bool) — (Va.a) 
< | al A2 = ((Va.a = Int) > Int) — Bool) — (Int — Int) 
Aa a Ay A3 = ((Va.* — Int) — Int) > Bool) — (Int > x) 
Se gnl gM eles Aa = (((« = Int) — Int) — Bool) — (Int —> x) 


Fig. 6. Example that is fixed by the new definition of consistent subtyping. 


Definition of Consistent Subtyping. From the above discussion, we are ready to 
modify Definition 1, and adapt it to our notation: 


Definition 2 (Consistent Subtyping) 


WEFA<:C Cw D VF D<:B 
WELASB 


With Definition 2, Fig.6 illustrates the correct relation chain for the broken 
example shown in Fig. 4c. At first sight, Definition 2 seems worse than the origi- 
nal: we need to guess two types! It turns out that Definition 2 is a generalization 
of Definition 1, and they are equivalent in the system of Siek and Taha [22]. 
However, more generally, Definition 2 is compatible with polymorphic types. 


Proposition 1 (Generalization of Consistent Subtyping) 


— Definition 2 subsumes Definition 1. 
— Definition 1 is equivalent to Definition 2 in the system of Siek and Taha [22]. 


3.3 Abstracting Gradual Typing 


Garcia et al. [13] presented a new foundation for gradual typing that they 
call the Abstracting Gradual Typing (AGT) approach. In the AGT approach, 
gradual types are interpreted as sets of static types, where static types refer 
to types containing no unknown types. In this interpretation, predicates and 
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functions on static types can then be lifted to apply to gradual types. Central 
to their approach is the so-called concretization function. For simple types, a 
concretization y from gradual types to a set of static types* is defined as follows: 


Definition 3 (Concretization) 
(Int) = {Int} y(A > B) = 7(A) > 7(B) (x) = {All static types} 


Based on the concretization function, subtyping between static types can be 
lifted to gradual types, resulting in the consistent subtyping relation: 


Definition 4 (Consistent Subtyping in AGT). A <: B if and only if A, <: 
Bı for some A, € y(A), Bı € (B). 


Later they proved that this definition of consistent subtyping coincides with 
that of Siek and Taha [22] (Definition 1). By Proposition 1, we can directly con- 
clude that our definition coincides with AGT: 


Proposition 2 (Equivalence to AGT on Simple Types). A < B iff 
A<: B. 

However, AGT does not show how to deal with polymorphism (e.g. the inter- 
pretation of type variables) yet. Still, as noted by Garcia et al. [13], it is a promis- 
ing line of future work for AGT, and the question remains whether our definition 
would coincide with it. 

Another note related to AGT is that the definition is later adopted by 
Castagna and Lanvin [7], where the static types Ai, Bı in Definition 4 can be 
algorithmically computed by also accounting for top and bottom types. 


3.4 Directed Consistency 
Directed consistency [15] is defined in terms of precision and static subtyping: 
ATA A<: B B'CB 
A SB 


The judgment A E B is read “A is less precise than B”. In their setting, precision 
is defined for type constructors and subtyping for static types. If we interpret 
this definition from AGT’s point of view, finding a more precise static type? 
has the same effect as concretization. Namely, A’ E A implies A € ¥(A’) and 
B’ E B implies B € 7(B’). Therefore we consider this definition as AGT-style. 
From this perspective, this definition naturally coincides with Definition 2. 

The value of their definition is that consistent subtyping is derived composi- 
tionally from static subtyping and precision. These are two more atomic relations. 
At first sight, their definition looks very similar to Definition 2 (replacing E by 
<: and <: by ~). Then a question arises as to which one is more fundamental. To 
answer this, we need to discuss the relation between consistency and precision. 


t For simplification, we directly regard type constructor — as a set-level operator. 
5 The definition of precision of types is given in appendix. 
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Relating Consistency and Precision. Precision is a partial order (anti-symmetric 
and transitive), while consistency is symmetric but not transitive. Nonetheless, 
precision and consistency are related by the following proposition: 


Proposition 3 (Consistency and Precision) 


- If A~ B, then there exists (static) C, such that ACC, and BEC. 
— If for some (static) C, we have AC C, and BEC, then we have A ~ B. 


It may seem that precision is a more atomic relation, since consistency can be 
derived from precision. However, recall that consistency is in fact an equivalence 
relation lifted from static types to gradual types. Therefore defining consistency 
independently is straightforward, and it is theoretically viable to validate the 
definition of consistency directly. On the other hand, precision is usually con- 
nected with the gradual criteria [25], and finding a correct partial order that 
adheres to the criteria is not always an easy task. For example, Igarashi et al. 
[14] argued that term precision for System Fg is actually nontrivial, leaving 
the gradual guarantee of the semantics as a conjecture. Thus precision can be 
difficult to extend to more sophisticated type systems, e.g. dependent types. 

Still, it is interesting that those two definitions illustrate the correspondence 
of different foundations (on simple types): one is defined directly on gradual 
types, and the other stems from AGT, which is based on static subtyping. 


3.5 Consistent Subtyping Without Existentials 


Definition 2 serves as a fine specification of how consistent subtyping should 
behave in general. But it is inherently non-deterministic because of the two 
intermediate types C and D. As with Definition 1, we need a combined relation to 
directly compare two types. A natural attempt is to try to extend the restriction 
operator for polymorphic types. Unfortunately, as we show below, this does not 
work. However it is possible to devise an equivalent inductive definition instead. 


Attempt to Extend the Restriction Operator. Suppose that we try to extend the 
restriction operator to account for polymorphic types. The original restriction 
operator is structural, meaning that it works for types of similar structures. 
But for polymorphic types, two input types could have different structures due 
to universal quantifiers, e.g., Va.a — Int and (Int — x) — Int. If we try to 
mask the first type using the second, it seems hard to maintain the information 
that a should be instantiated to a function while ensuring that the return type is 
masked. There seems to be no satisfactory way to extend the restriction operator 
in order to support this kind of non-structural masking. 


Interpretation of the Restriction Operator and Consistent Subtyping. If the 
restriction operator cannot be extended naturally, it is useful to take a step 
back and revisit what the restriction operator actually does. For consistent sub- 
typing, two input types could have unknown types in different positions, but we 
only care about the known parts. What the restriction operator does is (1) erase 
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WEASB 


VWatAsB Wer Wt Alar] <B 
—— CS-FORALLR CS-FoRALLL 
VF AĮ Ya.B Yt Ya.A [S B 


YF B A WE Ao B2 acw 
CS-FuN ——— CS-TVaR ——.—_ CS-Int 
WEA, > A.SBi- B2 WFaga Y F Int < Int 


CS-UNKNOWNL ———  CS-UnknownR, 


WExSA WFA 
Fig. 7. Consistent Subtyping for implicit polymorphism. 


the type information in one type if the corresponding position in the other type is 
the unknown type; and (2) compare the resulting types using the normal subtyp- 
ing relation. The example below shows the masking-off procedure for the types 
Int — x — Bool and Int — Int — x. Since the known parts have the relation that 
Int > x > x <: Int — x — x, we conclude that Int — x — Bool < Int > Int > x. 


Int =| x | — | Bool) | im int — + ee 
< 


Int => | Int | > | x | jin + 5 Boor = Int — * > x 


Here differences of the types in boxes are erased because of the restriction oper- 
ator. Now if we compare the types in boxes directly instead of through the lens 
of the restriction operator, we can observe that the consistent subtyping relation 
always holds between the unknown type and an arbitrary type. We can interpret 
this observation directly from Definition 2: the unknown type is neutral to sub- 
typing (* <: x), the unknown type is consistent with any type (* ~ A), and 
subtyping is reflexive (A <: A). Therefore, the unknown type is a consistent 
subtype of any type (x < A), and vice versa (A < x). Note that this interpre- 
tation provides a general recipe on how to lift a (static) subtyping relation to a 
(gradual) consistent subtyping relation, as discussed below. 


Defining Consistent Subtyping Directly. From the above discussion, we can define 
the consistent subtyping relation directly, without resorting to subtyping or con- 
sistency at all. The key idea is that we replace <: with < in Fig.3, get rid 
of rule S-UNKNOWN and add two extra rules concerning x, resulting in the 
rules of consistent subtyping in Fig. 7. Of particular interest are the rules CS- 
UNKNOWNL and CS-UNKNowNR, both of which correspond to what we just 
said: the unknown type is a consistent subtype of any type, and vice versa. 
From now on, we use the symbol < to refer to the consistent subtyping relation 
in Fig. 7. What is more, we can prove that those two are equivalent®: 


Theorem 1. YWFASBSWEA<:C,C~D,Wt'D<: B for some C,D. 


6 Theorems with 7 are those proved in Coq. The same applies to Lemmas. 
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Wre:Anrws 
cr: AEw Ware:Ans 
—— VAR SS Nat GEN 
WFr: Ar YEn:Intwn UF e:VYa.A ~ Aa.s 
Wc:ArFe:Bws Wa:tTrke:Brws 
LAMANN LAM 
WE Ar:A.e:A-> Bw dr:A.s8 Wt Aw. e: TA Bw dU: T. sS 


Vre: Aw 54 Wt Ap A; — Ao Wt e2: A3 ~ s2 WE As S At 
Wee, e2 : Az ~> ((A > Aı > A2) sı) ((A3 => Aı) s2) 


APP 


WE Ab A, > A2 


Wer Wt Alar T| > Ai > A2 
W F Ya.A > Ay — Ag 


M-FORALL 


M-ARR ———— M-UNKNOWN 


WE (Ai > Az)? (Aı > A2) WFxbo*x—>x 


Fig. 8. Declarative typing 


4 Gradually Typed Implicit Polymorphism 


In Sect.3 we introduced the consistent subtyping relation that accommodates 
polymorphic types. In this section we continue with the development by giving a 
declarative type system for predicative implicit polymorphism that employs the 
consistent subtyping relation. The declarative system itself is already quite inter- 
esting as it is equipped with both higher-rank polymorphism and the unknown 
type. The syntax of expressions in the declarative system is given below: 


Expressions e:=2|n|Ar:A.e| Ar. elee 


4.1 Typing in Detail 


Figure 8 gives the typing rules for our declarative system (the reader is advised to 
ignore the gray-shaded parts for now). Rule VAR extracts the type of the variable 
from the typing context. Rule NAT always infers integer types. Rule LAMANN 
puts x with type annotation A into the context, and continues type checking the 
body e. Rule LAM assigns a monotype T to x, and continues type checking the 
body e. Gradual types and polymorphic types are introduced via annotations 
explicitly. Rule GEN puts a fresh type variable a into the type context and 
generalizes the typing result A to Va.A. Rule APP first infers the type of e1, 
then the matching judgment ¥ F Ap A, — Ad extracts the domain type A, and 
the codomain type Ag from type A. The type A3 of the argument ez is then 
compared with A; using the consistent subtyping judgment. 
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Matching. The matching judgment of Siek et al. [25] can be extended to polymor- 
phic types naturally, resulting in Y F A> A, > Ag. In M-FORALL, a monotype 
T is guessed to instantiate the universal quantifier a. This rule is inspired by the 
application judgment 6+ Ae e = C [11], which says that if we apply a term of 
type A to an argument e, we get something of type C. If A is a polymorphic type, 
the judgment works by guessing instantiations until it reaches an arrow type. 
Matching further simplifies the application judgment, since it is independent of 
typing. Rule M-ARR and M-UNKNOWN are the same as Siek et al. [25]. M-ARR 
returns the domain type A; and range type A as expected. If the input is x, 
then M-UNKNOWN returns x as both the type for the domain and the range. 

Note that matching saves us from having a subsumption rule (SUB in Fig. 2). 
the subsumption rule is incompatible with consistent subtyping, since the latter 
is not transitive. A discussion of a subsumption rule based on normal subtyping 
can be found in the appendix. 


4.2 Type-Directed Translation 


We give the dynamic semantics of our language by translating it to AB. Below 
we show a subset of the terms in AB that are used in the translation: 


Terms s:= 7 |n |Ax: A. s| 4a.s | sı s2 | (ACB) s 


A cast (A — B) s converts the value of term s from type A to type B. A cast 
from A to B is permitted only if the types are compatible, written A < B, as 
briefly mentioned in Sect. 3.1. The syntax of types in AB is the same as ours. 

The translation is given in the gray-shaded parts in Fig. 8. The only interest- 
ing case here is to insert explicit casts in the application rule. Note that there 
is no need to translate matching or consistent subtyping, instead we insert the 
source and target types of a cast directly in the translated expressions, thanks 
to the following two lemmas: 


Lemma 1 (> to <). IfWt Ap A, > Ag, then A < A, > A. 
Lemma 2 (< to <). IfYH AXB, then AX B. 


In order to show the correctness of the translation, we prove that our trans- 
lation always produces well-typed expressions in AB. By Lammas 1 and 2, we 
have the following theorem: 


Theorem 2 (Type Safety). If W He: A~ s, then YHP s: A. 


Parametricity. An important semantic property of polymorphic types is rela- 
tional parametricity [19]. The parametricity property says that all instances of 
a polymorphic function should behave uniformly. A classic example is a func- 
tion with the type Va.a — a. The parametricity property guarantees that a 
value of this type must be either the identity function (i.e., Aw.x) or the unde- 
fined function (one which never returns a value). However, with the addition of 
the unknown type x, careful measures are to be taken to ensure parametricity. 
This is exactly the circumstance that AB was designed to address. Ahmed et al. 
[2] proved that AB satisfies relational parametricity. Based on their result, and 
by Theorem 2, parametricity is preserved in our system. 
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Ambiguity from Casts. The translation does not always produce a unique target 
expression. This is because when we guess a monotype 7 in rule M-FORALL and 
CS-FORALLL, we could have different choices, which inevitably leads to differ- 
ent types. Unlike (non-gradual) polymorphic type systems [11,18], the choice 
of monotypes could affect runtime behaviour of the translated programs, since 
they could appear inside the explicit casts. For example, the following shows two 
possible translations for the same source expression Ax : x. f x, where the type 
of f is instantiated to Int — Int and Bool — Bool, respectively: 


f :Vaa—atk (Aw: x. f ©) :*— Int 


~ (At: x. ((Va.a > a > Int > Int) f) ( & => Int) z)) 


f :Vaa—atk (Ax:x. f x) :* — Bool 


~ (Ax: x. ((Va.a — a => Bool — Bool) f) ( (x @ Bool) x)) 


If we apply Ax: x. f x to 3, which is fine since the function can take any input, 
the first translation runs smoothly in AB, while the second one will raise a cast 
error (Int cannot be cast to Bool). Similarly, if we apply it to true, then the second 
succeeds while the first fails. The culprit lies in the highlighted parts where any 
instantiation of a would be put inside the explicit cast. More generally, any 
choice introduces an explicit cast to that type in the translation, which causes 
a runtime cast error if the function is applied to a value whose type does not 
match the guessed type. Note that this does not compromise the type safety of 
the translated expressions, since cast errors are part of the type safety guarantees. 


Coherence. The ambiguity of translation seems to imply that the declarative 
system is incoherent. A semantics is coherent if distinct typing derivations of 
the same typing judgment possess the same meaning [20]. We argue that the 
declarative system is “coherent up to cast errors” in the sense that a well-typed 
program produces a unique value, or results in a cast error. In the above example, 
whatever the translation might be, applying Ax: x. f x to 3 either results in a 
cast error, or produces 3, nothing else. 

This discrepancy is due to the guessing nature of the declarative system. As 
far as the declarative system is concerned, both Int — Int and Bool — Bool 
are equally acceptable. But this is not the case at runtime. The acute reader 
may have found that the only appropriate choice is to instantiate f to x —> x. 
However, as specified by rule M-FORALL in Fig. 8, we can only instantiate type 
variables to monotypes, but x is not a monotype! We will get back to this issue 
in Sect. 6.2 after we present the corresponding algorithmic system in Sect. 5. 


4.3 Correctness Criteria 


Siek et al. [25] present a set of properties that a well-designed gradual typing 
calculus must have, which they call the refined criteria. Among all the crite- 
ria, those related to the static aspects of gradual typing are well summarized 
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by Cimini and Siek [8]. Here we review those criteria and adapt them to our 
notation. We have proved in Coq that our type system satisfies all these criteria. 


Lemma 3 (Correctness Criteria) 


— Conservative extension: for all static Y, e, and A, 
e if W HOL e: A, then there exists B, such that W - e : B, and Y H B <: A. 
e ifWte:A, thenWt+ore:A 
— Monotonicity w.r.t. precision: for all V,e,e’, A, ifU Fe: A, ande’ Ce, 
then WF e: B, and BCA for some B. 
— Type Preservation of cast insertion: for all W,e, A, if W% F e: A, then 
Fe: A~ s, andWt® s: A for some s. 
— Monotonicity of cast insertion: for all Y, e1, e2,€,e5,A, fU F e1: A~ 
e1, and Y F eg: A ~~ eh, and e1 E ez, then Y i Y F el CF eh. 


The first criterion states that the gradual type system should be a conser- 
vative extension of the original system. In other words, a static program that is 
typeable in the Odersky-Läufer type system if and only if it is typeable in the 
gradual type system. A static program is one that does not contain any type x’. 
However since our gradual type system does not have the subsumption rule, it 
produces more general types. 

The second criterion states that if a typeable expression loses some type 
information, it remains typeable. This criterion depends on the definition of the 
precision relation, written A E B, which is given in the appendix. The relation 
intuitively captures a notion of types containing more or less unknown types (x). 
The precision relation over types lifts to programs, i.e., e; E e2 means that e1 
and ez are the same program except that e2 has more unknown types. 

The first two criteria are fundamental to gradual typing. They explain for 
example why these two programs (Av : Int. +1) and (Av : x. +1) are 
typeable, as the former is typeable in the Odersky-Laufer type system and the 
latter is a less-precise version of it. 

The last two criteria relate the compilation to the cast calculus. The third 
criterion is essentially the same as Theorem 2, given that a target expression 
should always exist, which can be easily seen from Fig.8. The last criterion 
ensures that the translation must be monotonic over the precision relation C. 

As for the dynamic guarantee, things become a bit murky for two reasons: (1) 
as we discussed before, our declarative system is incoherent in that the runtime 
behaviour of the same source program can vary depending on the particular 
translation; (2) it is still unknown whether dynamic guarantee holds in AB. We 
will have more discussion on the dynamic guarantee in Sect. 6.3. 


5 Algorithmic Type System 


In this section we give a bidirectional account of the algorithmic type system that 
implements the declarative specification. The algorithm is largely inspired by the 


T Note that the term static has appeared several times with different meanings. 
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Expressions exn=a|n|Av:A.e|Ar.eleele:A 
Types A,B ::= Int |a |@ | A— B|Va.A |x 
Monotypes T,o := İnt |a |@ |T — o 

Contexts TAO = p| nD A| Dal rna T, @=T 
Complete Contexts QN2=G8|2,c:A|Q,a|Q,4=7 


Fig. 9. Syntax of the algorithmic system 


PEASBAA 
ACS-TV. ACS-EXVs 
Tla]ka Saal a Tarasa i 
—— ACS-INtrT č —~~~~~~ ~ ACS-UnknownL ———~~ ~~~ ACS-UNKNownR, 
Pe Int < Int 47 TResS AAP PEAS*d7r 
TFB SA:4O OF [O]Ao < [0]B2 4A 
ACS-FuN 
TFA, > A Ș Bı > BAA 
Tat A<BAdA,a,O Dat Alara] < BIA 
ACS-FORALLR ACS-ForRALLL 
TRASVaBia TrVaA< BAA 
@¢ f(A) Tfaj+ag Ata @¢fo(A) Tfajk- AS@ia 
——— T ACS-InstTL = = ACS-InstR 
Tfaj+}asgAia Tfaj-} AS@iAA 


Fig. 10. Algorithmic consistent subtyping 


algorithmic bidirectional system of Dunfield and Krishnaswami [11] (henceforth 
DK system). However our algorithmic system differs from theirs in three aspects: 
(1) the addition of the unknown type x; (2) the use of the matching judgment; 
and (3) the approach of gradual inference only producing static types [12]. We 
then prove that our algorithm is both sound and complete with respect to the 
declarative type system. Full proofs can be found in the appendix. 


Algorithmic Contexts. The algorithmic context I’ is an ordered list containing 
declarations of type variables a and term variables x : A. Unlike declarative con- 
texts, algorithmic contexts also contain declarations of existential type variables 
@, which can be either unsolved (written @) or solved to some monotype (writ- 
ten @ = T). Complete contexts 2 are those that contain no unsolved existential 
type variables. Figure 9 shows the syntax of the algorithmic system. Apart from 
expressions in the declarative system, we have annotated expressions e: A. 


5.1 Algorithmic Consistent Subtyping and Instantiation 


Figure 10 shows the algorithmic consistent subtyping rules. The first five rules 
do not manipulate contexts. Rule ACS-FUN is a natural extension of its declar- 
ative counterpart. The output context of the first premise is used by the second 
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Pras Ada 


7 INSTLSOLVE = INSTLREACH 
~ 


a 
INSTLSOLVEU ox 
a 


Tlaj- asx Tia] TIa] F 


G2, 41,@ = @ — ã2] F At Sa 410 OF@ g [O] A2 4 A 
Ta Fg A > A.A 


INSTLARR 
Fig. 11. Algorithmic instantiation 


premise, and the output context of the second premise is the output context 
of the conclusion. Note that we do not simply check A> S Bo, but apply O to 
both types (e.g., [O] A2). This is to maintain an important invariant that types 
are fully applied under input context I’ (they contain no existential variables 
already solved in I’). The same invariant applies to every algorithmic judgment. 
Rule ACS-FoORALLR looks similar to its declarative counterpart, except that 
we need to drop the trailing context a,O from the concluding output context 
since they become out of scope. Rule ACS-FORALLL generates a fresh existen- 
tial variable &, and replaces a with G@ in the body A. The new existential variable 
a is then added to the premise’s input context. As a side note, when both types 
are quantifiers, then either ACS-FORALLR or ACS-FORALLR could be tried. 
In practice, one can apply ACS-FORALLR eagerly. The last two rules together 
check consistent subtyping with an unsolved existential variable on one side and 
an arbitrary type on the other side by the help of the instantiation judgment. 

The judgment [+ @ S$ AC A defined in Fig. 11 instantiates unsolved exis- 
tential variables. Judgment @ S$ A reads “instantiate @ to a consistent subtype 
of A”. For space reasons, we omit its symmetric judgement [+ AS ac A. 
Rule INSTLSOLVE and rule INSTLREACH set @ to r and b in the output context, 
respectively. Rule INSTLSOLVEU is similar to ACS-UNKNowNR in that we put 
no constraint on @ when it meets the unknown type x. This design decision 
reflects the point that type inference only produces static types [12]. We will get 
back to this point in Sect.6.2. Rule INSTLALLR is the instantiation version of 
rule ACS-FORALLR. The last rule INSTLARR applies when @ meets a function 
type. It follows that the solution must also be a function type. That is why, in 
the first premise, we generate two fresh existential variables @1 and G2, and insert 
them just before @ in the input context, so that the solution of @ can mention 
them. Note that A; S @ switches to the other instantiation judgment. 


5.2 Algorithmic Typing 


We now turn to the algorithmic typing rules in Fig. 12. The algorithmic sys- 
tem uses bidirectional type checking to accommodate polymorphism. Most of 


Consistent Subtyping for All 23 


re> A+A 
(a: A)ET n 
—— c AV: — AN; 
TFs Aar Trnsitir 
T,G@,b,c:@+e=bid,r:4,0 Ic: AFe=> BAA,«:A,O 
= ALAMU ALAMANNA 
TRKAnwne=>aG@—-b4A TrAr:AexsA-BAA 


TFA FreH=AAia 
TFre:AsAAA 


AANNO 


Trea =a=AAOQ, Oı F [O1] A> Ai > A2 4 O2 O2 F e2 = [O2] A; 4 A 


AAPP 
rT €1 €2 Ao A 
FreH=AAA 
I,c:AFe=BAA,z:A,O TrakeHAAA,a,O 
ALAM AGEN 
TrAreHA-BAA Fres=VaAia 
PresAi@  OF[OJAS[O|BIA 
Tre=B4aA i 
Tr- A> A —~ Asta 
T,@ F Aja => @]| > Ay > A2 4 A 
AM-FORALL AM-ARR 
TFVaApA,— A2 d A IF (Ai > A2)> (Ar > Ao) AT 
SH AM-UNKNOWN x = = AM-VaR 
DR eRk > kA T{q - €> a — bAT [a,b,c =a — b] 


Fig. 12. Algorithmic typing 


them are quite standard. Perhaps rule AAPP (which differs significantly from 
that in the DK system) deserves attention. It relies on the algorithmic match- 
ing judgment [ HF Ap Ay > Ag 4 A. Rule AM-FORALLL replaces a with 
a fresh existential variable @, thus eliminating guessing. Rule AM-ARR AND 
AM-UNKNOWN correspond directly to the declarative rules. Rule AM- 
VAR, which has no corresponding declarative version, is similar to 
INSTRARR/INSTLARR: we create G and b and add €= @ — b to the context. 


5.3. Completeness and Soundness 


We prove that the algorithmic rules are sound and complete with respect to the 
declarative specifications. We need an auxiliary judgment I —> A that captures 
a notion of information increase from input contexts I’ to output contexts A [11]. 
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Soundness. Roughly speaking, soundness of the algorithmic system says that 
given an expression e that type checks in the algorithmic system, there exists a 
corresponding expression e’ that type checks in the declarative system. However 
there is one complication: e does not necessarily have more annotations than e’. 
For example, by ALAM we have Ax. x < (Va.a) > (Va.a), but Ax. x itself cannot 
have type (Va.a) — (Va.a) in the declarative system. To circumvent that, we add 
an annotation to the lambda abstraction, resulting in Ax : (Va.a). x, which is 
typeable in the declarative system with the same type. To relate Ax. x and 
Ax : (Va.a). x, we erase all annotations on both expressions. The definition of 
erasure |-| is standard and thus omitted. 


Theorem 1 (Soundness of Algorithmic Typing). Given A — N, 


1. fT e= AAA then 
2. fT e=AAA then 


e’ such that [Q| At e : [Q]A and |e] = |e’]. 
e’ such that [Q| At e : [Q]A and |e] = |e’|. 


Completeness. Completeness of the algorithmic system is the reverse of sound- 
ness: given a declarative judgment of the form [Q|I FH [Q]..., we want to get 
an algorithmic derivation of I F --- 4 A. It turns out that completeness is a bit 
trickier to state in that the algorithmic rules generate existential variables on 
the fly, so A could contain unsolved existential variables that are not found in 
I’, nor in §2. Therefore the completeness proof must produce another complete 
context Q’ that extends both the output context A, and the given complete 
context 2. As with soundness, we need erasure to relate both expressions. 


Theorem 2 (Completeness of Algorithmic Typing). Given l — 2 and 
THA, if[Q|)P2 Fe: A then there exist A, Q', A’ and e such that A — Q' and 
QM and TF e€ => AAA and A= [|Q]A and |e] = |e]. 


6 Discussion 


6.1 Top Types 


To demonstrate that our definition of consistent subtyping (Definition 2) is appli- 
cable to other features, we show how to extend our approach to Top types with 
all the desired properties preserved. 

In order to preserve the orthogonality between subtyping and consistency, 
we require T to be a common supertype of all static types, as shown in rule 
S-Top. This rule might seem strange at first glance, since even if we remove the 
requirement A static, the rule seems reasonable. However, an important point 
is that because of the orthogonality between subtyping and consistency, subtyp- 
ing itself should not contain a potential information loss! Therefore, subtyping 
instances such as x <: T are not allowed. For consistency, we add the rule that 
T is consistent with T, which is actually included in the original reflexive rule 
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A ~ A. For consistent subtyping, every type is a consistent subtype of T, for 
example, Int => x ST. 


A static TaT 
— == oT ~x ~a CS-T 
PFA<T YFAST T” 


It is easy to verify that Definition 2 is still equivalent to that in Fig. 7 extended 
with rule CS-Top. That is, Theorem 1 holds: 


Proposition 4 (Extension with T). VF AS BSwWt-LA<:C,C~D, 
WE D<: B, for some C,D. 


We extend the definition of concretization (Definition 3) with T by adding 
another equation y(T) = {T}. Note that Castagna and Lanvin [7] also have this 
equation in their calculus. It is easy to verify that Proposition 2 still holds: 


Proposition 5 (Equivalent to AGT on T). A < B if only if A <: B. 


Siek and Taha’s [22] Definition of Consistent Subtyping Does Not Work for T. As 
the analysis in Sect. 3.2, Int — x < T only holds when we first apply consistency, 
then subtyping. However we cannot find a type A such that Int > x <: A and 
A ~ T. Also we have a similar problem in extending the restriction operator: 
non-structural masking between Int — x and T cannot be easily achieved. 


6.2 Interpretation of the Dynamic Semantics 


In Sect. 4.2 we have seen an example where a source expression could produce two 
different target expressions with different runtime behaviour. As we explained, 
this is due to the guessing nature of the declarative system, and from the typing 
point of view, no type is particularly better than others. However, in practice, 
this is not desirable. Let us revisit the same example, now from the algorithmic 
point of view (we omit the translation for space reasons): 


f:Vaasat (Aa: x. fr) >*>G4f:Vaa->a,a 


Compared with declarative typing, which produces many types (* — Int, x > 
Bool, and so on), the algorithm computes the type * > @ with @ unsolved in the 
output context. What can we know from the output context? The only thing we 
know is that @ is not constrained at all! However, it is possible to make a more 
refined distinction between different kinds of existential variables. The first kind 
of existential variables are those that indeed have no constraints at all, as they 
do not affect the dynamic semantics. The second kind of existential variables 
(as in this example) are those where the only constraint is that the variable was 
once compared with an unknown type [12]. 

To emphasize the difference and have better support for dynamic semantics, 
we could have gradual variables in addition to existential variables, with the dif- 
ference that only unsolved gradual variables are allowed to be unified with the 
unknown type. An irreversible transition from existential variables to gradual 
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variables occurs when an existential variable is compared with x. After the algo- 
rithm terminates, we can set all unsolved existential variables to be any (static) 
type (or more precisely, as Garcia and Cimini [12], with static type parameters), 
and all unsolved gradual variables to be x (or gradual type parameters). How- 
ever, this approach requires a more sophisticated declarative/algorithmic type 
system than the ones presented in this paper, where we only produce static 
monotypes in type inference. We believe this is a typical trade-off in existing 
gradual type systems with inference [12,23]. Here we suppress the complexity of 
dynamic semantics in favour of the conciseness of static typing. 


6.3 The Dynamic Guarantee 


In Sect. 4.3 we mentioned that the dynamic guarantee is closely related to the 
coherence issue. To aid discussion, we first give the definition of dynamic guar- 
antee as follows: 


Definition 5 (Dynamic guarantee). Suppose e Ce, ØF e: A~ sand 
dre’: A ~s, ifs lv, then s! |v andv' Lv. 


The dynamic guarantee says that if a gradually typed program evaluates to a 
value, then removing type annotations always produces a program that evaluates 
to an equivalent value (modulo type annotations). Now apparently the coherence 
issue of the declarative system breaks the dynamic guarantee. For instance: 


(Af :Va.a— a. Ax: Int. f x) (Ax. x) 3 (Af :Va.a > a. Aw: x. f x) (Ar. x) 3 


The left one evaluates to 3, whereas its less precise version (right) will give a 
cast error if a is instantiated to Bool for example. 

As discussed in Sect. 6.2, we could design a more sophisticated declarative /al- 
gorithmic type system where coherence is retained. However, even with a coher- 
ent source language, the dynamic guarantee is still a question. Currently, the 
dynamic guarantee for our target language AB is still an open question. Accord- 
ing to Igarashi et al. [14], the difficulty lies in the definition of term precision 
that preserves the semantics. 


7 Related Work 


Along the way we discussed some of the most relevant work to motivate, compare 
and promote our gradual typing design. In what follows, we briefly discuss related 
work on gradual typing and polymorphism. 


Gradual Typing. The seminal paper by Siek and Taha [21] is the first to pro- 
pose gradual typing. The original proposal extends the simply typed lambda 
calculus by introducing the unknown type x and replacing type equality with 
type consistency. Later Siek and Taha [22] incorporated gradual typing into a 
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simple object oriented language, and showed that subtyping and consistency are 
orthogonal — an insight that partly inspired our work. We show that subtyping 
and consistency are orthogonal in a much richer type system with higher-rank 
polymorphism. Siek et al. [25] proposed a set of criteria that provides impor- 
tant guidelines for designers of gradually typed languages. Cimini and Siek [8] 
introduced the Gradualizer, a general methodology for generating gradual type 
systems from static type systems. Later they also develop an algorithm to gen- 
erate dynamic semantics [9]. Garcia et al. [13] introduced the AGT approach 
based on abstract interpretation. 


Gradual Type Systems with Explicit Polymorphism. Ahmed et al. [1] proposed 
AB that extends the blame calculus [29] to incorporate polymorphism. The key 
novelty of their work is to use dynamic sealing to enforce parametricity. Devriese 
et al. [10] proved that embedding of System F terms into AB is not fully abstract. 
Igarashi et al. [14] also studied integrating gradual typing with parametric poly- 
morphism. They proposed System Fg, a gradually typed extension of System F, 
and System Fç, a new polymorphic blame calculus. As has been discussed exten- 
sively, their definition of type consistency does not apply to our setting (implicit 
polymorphism). All of these approaches mix consistency with subtyping to some 
extent, which we argue should be orthogonal. 


Gradual Type Inference. Siek and Vachharajani [23] studied unification-based 
type inference for gradual typing, where they show why three straightforward 
approaches fail to meet their design goals. Their type system infers gradual types, 
which results in a complicated type system and inference algorithm. Garcia 
and Cimini [12] presented a new approach where gradual type inference only 
produces static types, which is adopted in our type system. They also deal with 
let-polymorphism (rank 1 types). However none of these works deals with higher- 
ranked implicit polymorphism. 


Higher-Rank Implicit Polymorphism. Odersky and Laufer [17] introduced a type 
system for higher-rank types. Based on that, Peyton Jones et al. [18] developed 
an approach for type checking higher-rank predicative polymorphism. Dunfield 
and Krishnaswami [11] proposed a bidirectional account of higher-rank polymor- 
phism, and an algorithm for implementing the declarative system, which serves 
as a sole inspiration for our algorithmic system. The key difference, however, is 
the integration of gradual typing. Vytiniotis et al. [28] defers static type errors to 
runtime, which is fundamentally different from gradual typing, where program- 
mers can control over static or runtime checks by precision of the annotations. 
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Conclusion 


In this paper, we present a generalized definition of consistent subtyping, which 
is proved to be applicable to both polymorphic and top types. Based on the 
new definition of consistent subtyping, we have developed a gradually typed 
calculus with predicative implicit higher-rank polymorphism, and an algorithm 
to implement it. As future work, we are interested to investigate if our results 
can scale to real world languages and other programming language features. 
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Abstract. We propose HOBiT, a higher-order bidirectional program- 
ming language, in which users can write bidirectional programs in the 
familiar style of conventional functional programming, while enjoying the 
full expressiveness of lenses. A bidirectional transformation, or a lens, is 
a pair of mappings between source and view data objects, one in each 
direction. When the view is modified, the source is updated accordingly 
with respect to some laws—a pattern that is found in databases, model- 
driven development, compiler construction, and so on. The most common 
way of programming lenses is with lens combinators, which are lens-to- 
lens functions that compose simpler lenses to form more complex ones. 
Lens combinators preserve the bidirectionality of lenses and are expres- 
sive; but they compel programmers to a specialised point-free style—i.e., 
no naming of intermediate computation results—limiting the scalability 
of bidirectional programming. To address this issue, we propose a new 
bidirectional programming language HOBiT, in which lenses are repre- 
sented as standard functions, and combinators are mapped to language 
constructs with binders. This design transforms bidirectional program- 
ming, enabling programmers to write bidirectional programs in a flexible 
functional style and at the same time access the full expressiveness of 
lenses. We formally define the syntax, type system, and the semantics 
of the language, and then show that programs in HOBiT satisfy bidirec- 
tionality. Additionally, we demonstrate HOBiT’s programmability with 
examples. 


1 Introduction 


Transforming data from one format to another is a common task of program- 
ming: compilers transform program texts into syntax trees, manipulate the trees 
and then generate low-level code; database queries transform base relations into 
views; model transformations generate lower-level implementations from higher- 
level models; and so on. Very often, such transformations will benefit from being 
bidirectional, allowing changes to the targets to be mapped back to the sources 
too. For example, if one can run a compiler front-end (preprocessing, parsing, 
desugaring, etc.) backwards, then all sorts of program analysis tools will be 
able to focus on a much smaller core language, without sacrificing usability, as 
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their outputs in term of the core language will be transformed backwards to the 
source language. In the same way, such needs arise in databases (the view-update 
problem [1,6,12]) and model-driven engineering (bidirectional model transforma- 
tion) [28,33,35]. 

As a response to this challenge, programming language researchers have 
started to design languages that execute deterministically in both directions, and 
the lens framework is the most prominent among all. In the lens framework, a 
bidirectional transformation (or a lens) £ € Lens S V, consists of get L € SV, 
and put L € S —> V — S [3,7,8]. (When clear from the context, or unimpor- 
tant, we sometimes omit the lens name and write simply get/put.) Function get 
extracts a view from a source, and put takes both an updated view and the orig- 
inal source as inputs to produce an updated source. The additional parameter 
of put makes it possible to recover some of the source data that is not present 
in the view. In other words, get needs not to be injective to have a put. Not all 
pairs of get/put are considered correct lenses. The following round-triping laws 
of a lens @ are generally required to establish bidirectionality: 


putCsv=s if getls=v (Acceptability) 
getls'=v if putlsv=s' (Consistency) 


for all s, s’ and v. (In this paper we write e = e’ with the assumption that 
neither e nor e’ is undefined. Stronger variants of the laws enforcing totality 
exist elsewhere, for example in [7].) Here consistency ensures that all updates on 
a view are captured by the updated source, and acceptability prohibits changes 
to the source if no update has been made on the view. Collectively, the two laws 
defines well-behavedness [1,7, 12]. 

The most common way of programming lenses is with lens combinators [3, 7,8], 
which are basically a selection of lens-to-lens functions that compose simpler lenses 
to form more complex ones. This combinator-based approach follows the long his- 
tory of lightweight language development in functional programming. The dis- 
tinctive advantage of this approach is that by restricting the lens language to a 
few selected combinators, well-behavedness can be more easily preserved in pro- 
gramming, and therefore given well-behaved lenses as inputs, the combinators are 
guaranteed to produce well-behaved lenses. This idea of lens combinators is very 
influential academically, and various designs and implementations have been pro- 
posed [2,3, 7-9, 16,17, 27,32] over the years. 


1.1 The Challenge of Programmability 


The complexity of a piece of software can be classified as either intrinsic or 
accidental. Intrinsic complexity reflects the inherent difficulty of the problem 
at hand, whereas accidental complexity arises from the particular programming 
language, design or tools used to implement the solution. This work aims at 
reducing the accidental complexity of bidirectional programming by contribut- 
ing to the design of bidirectional languages. In particularly, we identify a lan- 
guage restriction—i.e., no naming of intermediate computation results—which 
complicates lens programming, and propose a new design that removes it. 
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As a teaser to demonstrate the problem, let us consider the list append 
function. In standard unidirectional programming, it can be defined simply as 
append x y = case x of {[] > y; a: 2’ > a: append x’ y}. Astute readers may 
have already noticed that append is defined by structural recursion on x, which 
can be made explicit by using foldr as in append x y = foldr (:) y x. 

But in a lens language based on combinators, things are more difficult. Specif- 
ically, append now requires a more complicated recursion pattern, as below. 


appendL :: Lens ([A], [A]) [A] 
appendL = 
cond idL (A_. True) (A-.A_.|]) (consL 6 (idL x appendL)) (not o null) (A-.A-.L) 
ô rearr ô (outListL x idL) 
where outListL:: Lens [A] (Either () (A, [A])) 
rearr —:: Lens (Either () (a,b), c) (Either c (a, (b, c))) 
(ô) :: Lens bc > Lens a b > Lens ac 
cond  ::Lensac—...— Lensbc—...— Lens (Either ab) c 


It is beyond the scope of this paper to explain how exactly the definition of 
appendL works, as its obscurity is what this work aims to remove. Instead, we 
informally describe its behaviour and the various components of the code. The 
above code defines a lens: forwards, it behaves as the standard append, and 
backwards, it splits the updated view list, and when the length of the list changes, 
this definition implements (with the grayed part) the bias of keeping the length 
of the first source list whenever possible (to disambiguate multiple candidate 
source changes). Here, cond, (6), etc. are lens combinators and outListL and rearr 
are auxiliary lenses, as can be seen from their types. Unlike its unidirectional 
counterpart, appendL can no longer be defined as a structural recursion on list; 
instead it traverses a pair of lists with rather complex rearrangement rearr. 

Intuitively, the additional grayed parts is intrinsic complexity, as they are 
needed for directing backwards execution. However, the complicated recursion 
scheme, which is a direct result of the underlying limitation of lens languages, 
is certainly accidental. Recall that in the definition of append, we were able to 
use the variable y, which is bound outside of the recursion pattern, inside the 
body of foldr. But the same is not possible with lens combinators which are 
strictly ‘pointfree’. Moreover, even if one could name such variables (points), 
their usage with lens combinators will be very restricted in order to guarantee 
well-behavedness [21,23]. This problem is specific to opaque non-function objects 
such as lenses, and goes well beyond the traditional issues associated with the 
pointfree programming style. 

In this paper, we design a new bidirectional language HOBiT, which aims 
to remove much of the accidental difficulty found in combinator-based lens pro- 
gramming, and reduces the gap between bidirectional programming and stan- 
dard functional programming. For example, the following definition in HOBiT 
implements the same lens as appendL. 
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appendB :: B[A] — B[A] — B[A] 
appendB xz y = case x of ||] —y with \_.True by (A_.A_.[]) 
a: x' >a: appendB x’ y with not o null by (A_.A_.L) 


As expected, the above code shares the grayed part with the definition of append L 
as the two implement the same backwards behaviour. The difference is that 
appendB uses structural recursion in the same way as the standard unidirec- 
tional append, greatly simplifying programming. This is made possible by the 
HOBiT’s type system and semantics, allowing unrestricted use of free variables. 
This difference in approach is also reflected in the types: appendB is a proper 
function (instead of the abstract lens type of appendL), which readily lends itself 
to conventional functional programming. At the same time, appendB is also a 
proper lens, which when executed by the HOBiT interpreter behave exactly like 
appendL. A major technical challenge in the design of HOBiT is to guarantee 
this duality, so that functions like appendB are well-behaved by construction 
despite the flexibility in their construction. 


1.2 Contributions 


As we can already see from the very simple example above, the use of HOBiT 
simplifies bidirectional programming by removing much of the accidental com- 
plexity. Specifically, HOBiT stands out from existing bidirectional languages in 
two ways: 


1. It supports the conventional programming style that is used in unidirectional 
programming. As a result, a program in HOBiT can be defined in a way 
similar to how one would define only its get component. For example, appendB 
is defined in the same way as the unidirectional append. 

2. It supports incremental improvement. Given the very often close resemblance 
of a bidirectional-program definition and that of its get component, it becomes 
possible to write an initial version of a bidirectional program almost identical 
to its get component and then to adjust the backwards behaviour gradually, 
without having to significantly restructure the existing definition. 


Thanks to these distinctive advantages, HOBiT for the first time allows us to 
construct realistically-sized bidirectional programs with relative ease. Of course, 
this does not mean free lunch: the ability to control backwards behaviours will 
not magically come without additional code (for example the grayed part above). 
What HOBiT achieves is that programming effort may now focus on the pro- 
ductive part of specifying backwards behaviours, instead of being consumed by 
circumventing language restrictions. 
In summary, we make the following contributions in this paper. 


— We design a higher-order bidirectional programming language HOBIT, 
which supports convenient bidirectional programming with control of back- 
wards behaviours (Sect.3). We also discuss several extensions to the 
language (Sect. 5). 
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— We present the semantics of HOBiT inspired by the idea of staging [5], 
and prove the well-behavedness property using Kripke logical relations [18] 
(Sect. 4). 

— We demonstrate the programmability of HOBiT with examples such as desug- 
aring/resugaring [26] (Sect. 6). Additional examples including a bidirectional 
evaluator for A-calculus [21,23], a parser/printer for S-expressions, and book- 
mark extraction for Netscape [7] can be found at https://bitbucket.org/kztk/ 
hibx together with a prototype implementation of HOBiT. 


2 Overview: Bidirectional Programming Without 
Combinators 


In this section, we informally introduce the essential constructs of HOBiT and 
demonstrate their use by a few small examples. Recall that, as seen in the 
appendB example, the strength of HOBiT lies in allowing programmers to access 
A-abstractions without restrictions on the use of A-bound variables. 


2.1 The case Construct 


The most important language construct in HOBiT is case (pronounced as bidi- 

rectional case), which provides pattern matching and easy access to bidirectional 

branching, and also importantly, allows unrestricted use of \-bound variables. 
In general, a case expression has the following form. 


case e of {p; > e; with ¢ by p1;..-; Pn — en with oy by pn} 


(Like Haskell, we shall omit “{”, “}” and “;” if they are clear from the layout.) 
In the type system of HOBiT, a case-expression has type BB, if e and e; have 
types BA and BB, and ¢; and p; have types B — Bool and A > B — A, where 
A and B contains neither (—) nor B. The type BA can be understood intuitively 
as “updatable A”. Typically, the source and view data are given such B-types, 
and a function of type BA — BB is the HOBiT equivalent of Lens A B. 

The pattern matching part of case performs two implicit operations: it first 
unwraps the B-typed value, exposing its content for normal pattern matching, 
and then it wraps the variables bound by the pattern matching, turning them 
into ‘updatable’ B-typed values to be used in the bodies. For example, in the 
second branch of appendB, a and x’ can be seen as having types A and [A] in the 
pattern, but BA and B[A] types in the body; and the bidirectional constructor 
(:) :: BA — B[A] — B[A] combines them to produce a B-typed list. 

In addition to the standard conditional branches, case-expression has two 
unique components ¢; and p; called exit conditions and reconciliation functions 
respectively, which are used in backwards executions. Exit condition ¢; is an 
over-approximation of the forwards-execution results of the expressions e;. In 
other words, if branch 2 is choosen, then ¢; e; must evaluate to True. This asser- 
tion is checked dynamically in HOBiT, though could be checked statically with 
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a sophisticated type system [7]. In the backwards direction the exit condition is 
used for deciding branching: the branch with its exit condition satisfied by the 
updated view (when more than one match, the original branch used in the for- 
wards direction has higher priority) will be picked for execution. The idea is that 
due to the update in the view, the branch taken in the backwards direction may 
be different from the one taken in the original forwards execution, a feature that 
is commonly supported by lens languages [7] which we call branch switching. 

Branch switching is crucial to put’s robustness, i.e., the ability to handle 
a wide range of view updates (including those affect the branching decisions) 
without failing. We explain its working in details in the following. 


Branch Switching. Being able to choose a different branch in the backwards 
direction only solves part of the problem. Let us consider the case where a 
forward execution chooses the n*® branch, and the backwards execution, based 
on the updated view, chooses the m'* (m Æ n) branch. In this case, the original 
value of the pattern-matched expression e, which is the reason for the nt? branch 
being chosen, is not compatible with the put of the mt! branch. 

As an example, let us consider a simple function that pattern-matches on an 
Either structure and returns an list. Note that we have purposely omitted the 
reconciliation functions. 


f =: B(Either [A] (A, [A])) — BIA] 
f x = case x of Left ys — ys with \_.True {- no by here -} 
Right (y, ys) > y : ys with not o null 


We have said that functions of type BA — BB are also fully functioning lenses of 
type Lens A B. In HOBiT, the above code runs as follows, where HOBiT> is the 
prompt of HOBiT’s read-eval-print loop, and :get and :put are meta-language 
operations to perform get and put respectively. 


HOBiT> :get f (Left [1, 2, 3]) 


[1, 2, 3] 

HOBiT> :get f (Right (1, [2,3])) 

[1, 2, 3] 

HOBiT> :put f (Left [1, 2, 3]) [4,5] -- The view [1, 2,3] is updated to [4, 5]. 
Left [4, 5] -- Both exit conditions are true with [4, 5], 


-- so the original branch (Left) is taken. 
HOBiT> :put f (Right (1, [2,3])) [4,5] 


Right (4, [5]) -- Similar, but the original branch is Right. 
HOBiT> :put f (Right (1, [2,3])) [] 
L -- Branch switches, but computation fails. 


As we have explained above, exit conditions are used to decide which branch 
will be used in the backwards direction. For the first and second evaluations 
of put, the exit conditions corresponding to the original branches were true for 
the updated view. For the last evaluation of put, since the exit condition of 
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et:: A4B ti: A>B 
am z > bm am ue > bm 
—_I get: A+B ps 
| ae a 
Fail <— abs bn an put: ABA bn 


Fig. 1. Reconciliation function: assuming exit conditions @m and n where ọm bn = 
False but n bn = True, and reconciliation functions pm and pn. 


the original branch was false but that of the other branch was true, branch 
switching is required here. However, a direct put-execution of f with the inputs 
(Right (1, [2,3])) and [] crashes (represented by -L above), for a good reason, as 
the two inputs are in an inconsistent state with respect to f. 

This is where reconciliation functions come into the picture. For the Left 
branch above, a sensible reconciliation function will be (A_.A_.Left []), which 
when applied turns the conflicting source (Right (1, [2,3])) into Left [], and 
consequently the put-execution may succeed with the new inputs and returns 
Left []. It is not difficult to verify that the “reconciled” put-execution still sat- 
isfies well-behavedness. Note that despite the similarity in types, reconciliation 
functions are not put; they merely provide a default source value to allow stuck 
put-executions to proceed. We visualise the effect of reconciliation functions in 
Fig. 1. The left-hand side is bidirectional execution without successful branch- 
switching, and since ¢,, bn is false (indicating that b,, is not in the range of the 
mt” branch) the execution of put must (rightfully) fail in order to guarantee 
well-behavedness. On the right-hand side, reconciliation function pn produces 
a suitable source from am and b, (where n (get (Pn am bn)) is True), and 
put executes with b, and the new source pn am bn. It is worth mentioning that 
branch switching with reconciliation functions does not compromise correctness: 
though the quality of the user-defined reconciliation functions affects robustness 
as they may or may not be able to resolve conflicts, successful put-executions 
always guarantee well-behavedness, regardless the involvement of reconciliation 
functions. 

Revisiting appendB. Recall appendB from Sect. 1.1 (reproduced below). 


appendB :: B[A] > B[A] — B[A] 
appendB x y = case x of |] >y with à—.True by (A_.A_.[]) 
a: a’ — a: appendB x' y with not o null by (A_A_L) 

The exit condition for the nil case always returns true as there is no restriction 
on the value of y, and for the cons case it requires the returned list to be non- 
empty. In the backwards direction, when the updated view is non-empty, both 
exit conditions will be true, and then the original branch will be taken. This 
means that since appendB is defined as a recursion on x, the backwards execution 
will try to unroll the original recursion step by step (i.e., the cons branch will be 
taken for a number of times that is the same as the length of x) as long as the 
view remains non-empty. If an updated view list is shorter than z, then notonull 
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will become false before the unrolling finishes, and the nil branch will be taken 
(branch-switching) and the reconciliation function will be called. 

The definition of appendB is curried; straightforward uncurrying turns it into 
the standard form BA — BB that can be interpreted by HOBiT as a lens. The 
following HOBiT program is the bidirectional variant of uncurry. 


uncurryB :: (BA — BB — BC) — B(A, B) — BC 
uncurryB f z = let (x,y) =zin fry 


Here, let p = e in e’ is syntactic sugar for case e of {p — e’ with (\_.True) by 
(As.A_.s)}, in which the reconciliation function is never called as there is only 
one branch. Let appendB’ = uncurryB appendB, then we can run appendB’ as: 


HOBiT> :get appendB’ ([1, 2], [3, 4, 5]) 

1, 2,3, 4,5] 
HOBiT> :put appendB’ ([1, 2], [3,4,5]) [6,7,8, 9, 10] 

[6, 7], [8, 9, 10]) -- No structural change, no branch switching. 
HOBiT> :put appendB’ ([1, 2], [3, 4, 5]) [6, 7] 

[6, 7], [) -- No branch switching, still. 
HOBiT> :put appendB’ ([1, 2], [3,4,5]) [6] 

[6], []) -- Branch-switching happens and the recursion terminates early. 


Difference from Lens Combinators. As mentioned above, the idea of branch 
switching can be traced back to lens languages. In particular, the design of case 
is inspired by the combinator cond [7]. Despite the similarities, it is important to 
recognise that case is not only a more convenient syntax for cond, but also cru- 
cially supports the unrestricted use of A-bound variables. This more fundamental 
difference is the reason why we could define appendB in the conventional functional 
style as the variables x and y are used freely in the body of case. In other words, 
the novelty of HOBIT is its ability to combine the traditional (higher-order) func- 
tional programming and the bidirectional constructs as found in lens combinators, 
effectively establishing a new way of bidirectional programming. 


2.2 A More Elaborate Example: linesB 


In addition to supporting convenient programming and robustness in put exe- 
cution, the case constructs can also be used to express intricate details of 
backwards behaviours. Let us consider the lines function in Haskell as an 
example, which splits a string into a list of strings by newlines, for example, 
lines "AA\nBB\n" = ["AA","BB"], except that the last newline character in its 
input is optional. For example, lines returns ["AA","BB"] for both "AA\nBB\n" 
and "AA\nBB". Suppose that we want the backwards transformation of lines to 
exhibit a behaviour that depends on the original source: 
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linesB :: BString > B[String] 
linesB str = 
let (f,b) = breakNLB str 
in case b of ’?\n’: a:r—> f : linesB (x : r) 


with (> 1) o length by (Ab.A_.?\n’? :? ? : 6) 
b > f ı [] with (== 1) o length by (Ab.A_.lastNL b) 
where {lastNL |] = []; lastNL [?\n’] = [?\n’]; lastNL (a: x) = lastNL x} 


breakNLB :: BString + B(String, String) 
breakNLB str = case str of 


[] > (U0) with pı by (A_.A_.[]) 


\n?:s > ([],’?\n? is) with p2 by (A_.A_."\n") 


cis  — let (f,r) = breakNLB s in (c: f,r) with ps by (A_.A_."_ ") 
where {pi(z,y) = null y; po(x,y) = null x && not (null y); pa(x, y) = not (null x)} 


Fig. 2. linesB and breakNLB 


HOBiT> :put linesB "AA\nBB" ["a","b"] 


"a\nb" 

HOBiT> :put linesB "AA\nBB" ["a", pr Hem] 
"a\nb\nc" 

HOBiT> :put linesB "AA\nBB" ["a"] 

"a" 

HOBiT> :put linesB "AA\nBB\n" pas "b", Ne"| 
"a\nb\nc\n" 

HOBiT> :put linesB "AA\nBB\n" ["a"] 

"a\n" 


This behaviour is achieved by the definition in Fig. 2, which makes good use of 
reconciliation functions. Note that we do not consider the contrived corner case 
where the string ends with duplicated newlines such as in "A\n\n". The function 
breakNLB splits a string at the first newline; since breakNLB is injective, its exit 
conditions and reconciliation functions are of little interest. The interesting part 
is in the definition of linesB, particularly its use of reconciliation functions to 
track the existence of a last newline character. We firstly explain the branching 
structure of the program. On the top level, when the first line is removed from the 
input, the remaining string b may contain more lines, or be the end (represented 
by either the empty list or the singleton list [’\n’]). If the first branch is taken, 
the returned result will be a list of more than one element. In the second branch 
when it is the end of the text, b could contain a newline or simply be empty. We do 
not explicitly give patterns for the two cases as they have the same body f:[], but 
the reconciliation function distinguishes the two in order to preserve the original 
source structure in the backwards execution. Note that we intentionally use 
the same variable name b in the case analysis and the reconciliation function, to 
signify that the two represent the same source data. The use of argument b in the 
reconciliation functions serves the purpose of remembering the (non)existence of 
the last newline in the original source, which is then preserved in the new source. 
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e ::= x | Ax.e | e1 e2 | True | False | [] | e1 : e2 | case e of {p;i > e:}i=1,2 | fix (Af.e) 
| True | False | [] | e1 : e2 | case e of {p:i > ei with e; by e; }ini,2 
p ::= x | True | False | |] | pı : p2 


Fig. 3. Syntax of HOBiT Core 


It is worth noting that just like the other examples we have seen, this defini- 
tion in HOBiT shares a similar structure with a definition of lines in Haskell.! 
The notable difference is that a Haskell definition is likely to have a different 
grouping of the three cases of lines into two branches, as there is no need to 
keep track of the last newline for backwards execution. Recall that reconcilia- 
tion functions are called after branches are chosen by exit conditions; in the case 
of linesB, the reconciliation function is used to decide the reconciled value of b’ 
to be "\n" or "". This, however, means that we cannot separate the pattern b' 
into two "\n" and "" with copying its branch body and exit condition, because 
then we lose a chance to choose a reconciled value of b based on its original value. 


3 Syntax and Type System of HOBiT Core 


In this section, we describe the syntax and the type system of the core of HOBiT. 


3.1 Syntax 


The syntax of HOBiT Core is given in Fig.3. For simplicity, we only consider 
booleans and lists. The syntax is almost the same as the standard A-calculus with 
the fixed-point combinator (fix), lists and booleans. For data constructors and 
case expressions, there are in addition bidirectional versions that are underlined. 
We allow the body of fix to be non-As to make our semantics simple (Sect. 4), 
though such a definition like fix(Av.True : x) can diverge. 

Although in examples we used case/case-expressions with an arbitrary num- 
ber of branches having overlapping patterns under the first-match principle, we 
assume for simplicity that in HOBiT Core case/case-expressions must have 
exactly two branches whose patterns do not overlap; extensions to support these 
features are straightforward. As in Haskell, we sometimes omit the braces and 
semicolons if they are clear from the layout. 


1 Haskell’s lines’s behaviour is a bit more complicated as it returns |] if and only if the 
input is "". This behaviour can be achieved by calling linesB only when the input 
list is nonempty. 


HOBiT: Programming Lenses Without Using Lens Combinators 41 


T;Ate:A 
Tr(x)= A A(x) =o I,c:A;Are:B T;Ate:A>+B T;Atreg:A 
T;Ata:A T;Ata:Bo T; AF àze: A—>B Tr; AF e e2: B 


T,f:A;Ate:A 
T; At fix(Af.e): A L;A+ True: Bool [;At False: Bool I; At []: [A] 
T;Ate:A T;AF e: [A] 


TD; At e: e2 : [A] T; AF True: BBool T;A bF False: BBool PF; Ar []: Bio] 
T;Ate:Bo F;Atreg:Blo] FP; Ate:A itp: A DT; AFe: B (i=1,2) 
T; Ate, :e2: Blo] T; At case e of {p; > ej}i=1,2: B 


T;Ate:Bo AiFpi:0o T; A, 4AF e;: Br 
r; Akeli: r—> Bool T;Atel:o47r>40 (i= 1,2) 


T; At case e of {p; > e; with e; by e! }i=1,2 : Br 


TFp:A 


Dike: A Ih Fe:fA] 
x:AHx:A ØF True: Bool Ot False: Bool ØF []: [A] Ii, I2 F e1 : e2 : [A] 


Fig. 4. Typing rules: A F p : ø is similar to + p: A but asserts that the resulting 
environment is actually a bidirectional environment. 


3.2 Type System 
The types in HOBiT Core are defined as follows. 
A, B := Bo | A > B | [A] | Bool 


We use the metavariable o,7,... for types that do not contain — nor B, We call 
o-types pure datatypes, which are used for sources and views of lenses. Intuitively, 
Bo represents “updatable o”—data subject to update in bidirectional transfor- 
mation. We keep the type system of HOBiT Core simple, though it is possible 
to include polymorphic types or intersection types to unify unidirectional and 
bidirectional constructors. 

The typing judgment lT; A + e: A, which reads that under environments 
I and A, expression e has type A, is defined by the typing rules in Fig. 4. We 
use two environments: A (the bidirectional type environment) is for variables 
introduced by pattern-matching through case, and I’ for everything else. It is 
interesting to observe that A only holds pure datatypes, as the pattern variables 
of case have pure datatypes, while I holds any types. We assume that the 
variables in I’ and those in A are disjoint, and appropriate a-renaming has been 
done to ensure this. This separation of A from I does not affect typeability, 
but is key to our semantics and correctness proof (Sect.4). Most of the rules 
are standard except case; recall that we only use unidirectional constructors in 
patterns which have pure types, while the variables bound in the patterns are 
used as B-typed values in branch bodies. 
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4 Semantics of HOBiT Core 


Recall that the unique strength of HOBIT is its ability to mix higher-order uni- 
directional programming with bidirectional programming. A consequence of this 
mixture is that we can no longer specify its semantics in the same way as other 
first-order bidirectional languages such as [13], where two semantics—one for get 
and the other for put—suffice. This is because the category of lenses is believed 
to have no exponential objects [27] (and thus does not permit As). 


4.1 Basic Idea: Staging 


Our solution to this problem is staging [5], which separates evaluation into 
two stages: the unidirectional parts is evaluated first to make way for a bidi- 
rectional semantics, which only has to deal with the residual first-order pro- 
grams. As a simple example, consider the expression (Az.z) (x : ((Aw.w) y) = []). 
The first-stage evaluation, e Ju E, eliminates \s from the expression as in 
(Az.z) (x : ((Aw.w) y): []) Yu «yz []. Then, our bidirectional semantics will 
be able to treat the residual expression as a lens between value environments 
and values, following [13,20]. Specifically, we have the get evaluation relation 
L Fa E => v, which computes the value v of E under environment u as usual, 
and the put evaluation relation u Fp v = E 4 yp’, which computes an updated 
environment p’ for E from the updated view v and the original environment p. 
In pseudo syntax, it can be understood as put E u v = p’, where u represents 
the original source and p’ the new source. 

It is worth mentioning that a complete separation of the stages is not possible 
due to the combination of fix and case, as an attempt to fully evaluate them in 
the first stage will result in divergence. Thus, we delay the unidirectional eval- 
uation inside case to allow fix, and consequently the three evaluation relations 
(uni-directional, get, and put) are mutually dependent. 


4.2 Three Evaluation Relations: Unidirectional, get and put 
First, we formally define the set of residual expressions: 


E ::= True | False | [] | Ey : E2 | Awe 
| «| True | False | [] | £1 : E2 | case Eo of {p; > e; with E; by Ej}i=1,2 


They are treated as values in the unidirectional evaluation, and as expressions in 
the get and put evaluations. Notice that e or e; appear under A or case, meaning 
that their evaluations are delayed. 

The set of (first-order) values is defined as below. 


v ::= True | False | [] | v1: v2 


Accordingly, we define a (first-order) value environment u as a finite mapping 
from variables to first-order values. 
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e1 Ju Awe e2 bu E2 e[E2/z] 4u E elfix(Af.e)/f] 4u E 
gz Juz e e2 4u E Azx.e Ibu àz.e fix(Af.e) 4u E 
eo Ju Eo e; tu E; ef du EY (i=1,2) 
case eo of {p; > e; with e; by ef }i=1,2 wu case Eo of {p; > e; with E; by E; }i=1,2 


Fig. 5. Evaluation rules for unidirectional parts (excerpt) 


Unidirectional Evaluation Relation. The rules for the unidirectional eval- 
uation relation is rather standard, as excerpted in Fig. 5. The bidirectional con- 
structs (i.e., bidirectional constructors and case) are frozen, i.e., behave just like 
ordinary constructors in this evaluation. Notice that we can evaluate an expres- 
sion containing free variables; then the resulting residual expression may contain 
the free variables. 

Bidirectional (getand put) Evaluation Relations. The get and put evalu- 
ation relations, u Fa E > v and u Fp v = EH p, are defined so that they 
together form a lens. 


Weakening of Environment. Before we lay out the semantics, it is worth explain- 
ing a subtlety in environment handling. In conventional evaluation semantics, a 
larger than necessary environment does no harm, as long as there is no name 
clashes. For example, whether the expression x is evaluated under the environ- 
ment {x = 1} or {x = 1, y = 2} does not matter. However, the same is not true 
for bidirectional evaluation. Let us consider a residual expression E =x: y : |], 
and a value environment u = {x = 1,y = 2} as the original source. We expect 
to have u Fe E => 1:2: [|], which may be derived as: 


prees>1 prey: |]>2:[] 


eRe eiy:[)>1:2:[] 


In the put direction, for an updated view say 3 : 4 : [], we expect to have 
pep 3:4:[] <= EA {a = 3,y = 4} with the corresponding derivation: 


pre seed?) prpds(Jeyi [late 


php 3:4:[Jear:y:[)4{e=3,y=4} 


What shall the environments ?; and ?g be? One way is to have u Fp 3 <= 
z 4 {x =3,y=2}, and u Fp 4: [] Hy: [|] 4 {x= 1,y = 4}, where the vari- 
ables do not appear free in the residual expression takes their values from the 
original source environment u. However, the evaluation will get stuck here, as 
there is no reasonable way to produce the expected result {x = 3,y = 4} from 
2, = {x = 3,y = 2} and ?> = {x = 1, y = 4}. In other words, the redundancy in 
environment is harmful as it may cause conflicts downstream. 
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Our solution to this problem, which follows from [21—23, 29], is to allow put 
to return value environments containing only bindings that are relevant for the 
residual expressions under evaluation. For example, we have u Fp 3 = a 4 
{x = 3}, and u Fp 4: [] & y : [] 4 {y = 4}. Then, we can merge the two value 
environments ?; = {x = 3} and ?2 = {y= 4} to obtain the expected result 
{x = 3,y = 4}. As a remark, this seemingly simple solution actually has a non- 
trivial effect on the reasoning of well-behavedness. We defer a detailed discussion 
on this to Sect. 4.3. 

Now we are ready to define get and put evaluation rules for each bidirectional 
constructs. For variables, we just lookup or update environments. Recall that u 
is a mapping (i.e., function) from variables to (first-order) values, while we use 
a record-like notation such as {x = v}. 


pee «=> u(x) prpueat{«=v} 


For constants c where c = False, True, [], the evaluation rules are straightforward. 


uFaec>c uFpe&cdý 


The above-mentioned behaviour of the bidirectional cons expression F : Ea is 
formally given as: 


uke Esu pte Ep > v2 u Fp vi & Eid pui wep vg = Fed ph 
u Fe E i Fe > v : v2 h Fp v1 : v2 = Ey i Ez A pi Y py 


(Note that the variable rules guarantee that only free variables in the residual 
expressions end up in the resulting environments.) Here, Y is the merging oper- 
ator defined as: u Y w = w U p if there is no x such that u(x) # p(x). For 
example, {x = 3} Y {y = 4} = {x = 3,y = 4}, and {z = 3,y = 4} Y {y= 4} = 
{x = 3,y = 4}, but {x = 3,y = 2} Y {y = 4} is undefined. 

The most interesting rules are for case. In the get direction, it is not different 
from the ordinary case except that exit conditions are asserted, as shown in 
Fig. 6. We use the following predicate for pattern matching. 


match(pp, vo, Hk) = (Pkk = vo) A (dom(px) = fv(pe)) 


Here, we abuse the notation to write pz, for the value obtained from pk by 
replacing the free variables x in pp with up(x). One might notice that we have 
the disjoint union uW u; in Fig. 6 where u; holds the values of the variables in p;, 
as we assume a-renaming of bound variables that is consistent in get and put. 
Recall that pı and p2 are assumed not to overlap, and hence the evaluation is 
deterministic. Note that the reconciliation functions E; are untouched by the 
rule. 

The put evaluation rule of case shown in Fig. 6 is more involved. In addition 
to checking which branch should be chosen by using exit conditions, we need 
two rules to handle the cases with and without branch switching. Basically, 
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kG Eo >v0 match(pi,vo, pi) ei: tu Ei ww pi Ha Ei >v E; vu du True 


uta case Eo of {pi > e; with Ej by EF}, >Y 


a 


uta Eo > v0 match(pi, vo, pi) E; v 4u True ei; du Ei 
WY pi Fp v = Ei pl! Ydom(u),dom(ui) Hi Vo = Pilu; Iui) u Ep vo = Eo 4 uo 


pp v & case Eo of {p; > e; with E; by E} ga Ho Y u 


uta Eo = vo match(pi,vo, ui) E; v iu False j=3-—i Ej vu True ej du Ej 
Ej vo v Ju uo match(pj, uo, pj) 
LY uj Fp v < E; p Waom(ys),dom(j2;) My Vo = p;(u} Spy) WEP vo = Eo 4 uo 


utp v & case Fo of {p; > e; with E; by ae Apo Yu 


Fig. 6. get- and put-Evaluation of case: we write x,y u’ to ensure that dom(u) C X 
and dom(y’) CY. 


the branch to be taken in the backwards direction is decided first, by the get- 
evaluation of the case condition Eo and the checking of the exit condition Æ; 
against the updated view v. After that, the body of the chosen branch e; is firstly 
uni-directionally evaluated, and then its residual expression F; is put-evaluated. 
The last step is put-evaluation of the case-condition Eo. When branch switching 
happens, there is the additional step of applying the reconciliation function EY x 
Note the use of operator < in computing the updated case condition vg. 


; _ JW (£) if x € dom(p’) 
= i otherwise 


Recall that in the beginning of this subsection, we discussed our approach of 
avoiding conflicts by producing environments with only relevant variables. This 
means the u; above contains only variables that appear free in E;, which may or 
may not be all the variables in p;. Since this is the point where these variables 
are introduced, we need to supplement p; with u;i from the original pattern 
matching so that p; can be properly instantiated. 


Construction of Lens. Let us write Co[E] for a lens between value environ- 
ments and values, defined as: 


get LofE] u =v if uF E> v 
put Lol FE] wv =p’ ifwrpue bap 


Then, we can define the lens £[e] induced from e (a closed function expression), 
where e x ly E for some fresh variable zx. 


get Lie] s = get LofE] {x = s} 
put Lie] sv = (w a{x = s})(x) where w = put Lol FE] {x = s} v 


Actually, :get and :put in Sect. 2 are realised by get Lle] and put Lie]. 
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4.3 Correctness 


We establish the correctness of HOBIT Core: £[e] € Lens [o] [rT] is well-behaved 
for closed e of type Bo — Br. Recall that Lens S V is a set of lenses £, where 
get LE S — V and put L€ S — V — S. We only provide proof sketches in this 
subsection due to space limitation. 


<-well-behavedness. Recall that in the previous subsection, we allow environ- 
ments to be weakened during put-evaluation. Since not all variables in a source 
may appear in the view, during some intermediate evaluation steps (for example 
within case-branches) the weakened environment may not be sufficient to fully 
construct a new source. Recall that, in u Fp v < e p’, dom(’) can be smaller 
than dom(y), a gap that is fixed at a later stage of evaluation by merging (Y) 
and defaulting (<) with other environments. This technique reduces conflicts, but 
at the same time complicates the compositional reasoning of correctness. Specif- 
ically, due to the potentially missing information in the intermediate environ- 
ments, well-behavedness may be temporally broken during evaluation. Instead, 
we use a variant of well-behavedness that is weakening aware, which will then 
be used to establish the standard well-behavedness for the final result. 


Definition 1 (<-well-behavedness). Let (S, <) and (V, <) be partially- 
ordered sets. A lens £ € Lens S V is called <-well-behaved if it satisfies 


get ls =v = vis maximal A (W. v’ < v => put Ls v' Xs) 
(x-Acceptability) 


puilsy=s = (Y.S < s" = v < get L s”) (x-Consistency) 


for any s,s’ E€ S and v € V, where s is maximal. 


We write Lens=“” S V for the set of lenses in Lens S V that are ~<-well- 
behaved. In this section, we only consider the case where S and V are value 
environments and first-order values, where value environments are ordered by 
weakening (u < wu’ if u(x) = p’(x) for all  € dom(y)), and (<) = (=) for 
first-order values. In Sect. 5.2 we consider a slightly more general situation. 

The <-well-behavedness is a generalisation of the ordinary well-behavedness, 
as it coincides with the ordinary well-behavedness when (<) = (=). 


Theorem 1. For S and V with (<) = (=), a lens £ € Lens S V is <-well- 
behaved iff it is well-behaved. 


Kripke Logical Relation. The key step to prove the correctness of HOBiT 
Core is to prove that Lo] E] is always <-well-behaved if Æ is an evaluation result 
of a well-typed expression e. The basic idea is to prove this by logical relation 
that expression e of type Bo under the context A is evaluated to E, assuming 
termination, such that £o[£] is a <-well-behaved lens between [A] and [ø]. 
Usually a logical relation is defined only by induction on the type. In our 
case, as we need to consider A in the interpretation of Bo, the relation should 
be indexed by A too. However, naive indexing does not work due to substitutions. 
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For example, we could define a (unary) relation €4(Bo) as a set of expressions 
that evaluate to “good” (i.e., <-well-behaved) lenses between (the semantics of) 
A and g, and E,(Bo — Br) as a set of expressions that evaluate to “good” 
functions that map good lenses between A and ø to those between A and T. 
This naive relation, however, does not respect substitution, which can substitute 
a value obtained from an expression typed under A to a variable typed under 
A’ such that A C A’, where A and A’ need not be the same. With the naive 
definition, good functions at A need not be good functions at A’, as a good lens 
between A’ and ø is not always a good lens between A and ø. 

To remedy the situation, inspired by the denotation semantics in [24], we use 
Kripke logical relations [18] where worlds are As. 
Definition 2. We define the set EA [A] of expressions, the set Ra[ A] of residual 
expressions, the set [øo] of values and the set [A] of value environments as below. 


Ea|A] = {e | VE. e 4u E implies E € RalA]} 
Ral Bool] = {True, False} 
Ral [Al] = List Ral A] 
Ra[Bo] = {E | VA’. AC A’ implies Lo] E] € Lens™ [A’] [o]} 
RalA > B] = {F | VA’. AC A’ implies (VE € Ra [A]. F E € Ex [B])} 
[Bool] = {True, False} 
Ilol] = List [o] 
[A] = {u | dom(u) C dom(A) and Va € dom(s1).u(x) € [A(a)]} 


Here, for a set S, List S is inductively defined as: [] € List S, and s:t € List S 
for alls € S and t E€ List S. 


The notable difference from ordinary logical relations is the definition of 

Ral[A — B] where we consider an arbitrary A’ such that A C A’. This is the 

key to state RAJA] C Ra [A] if A C A’. Notice that [o] = Ralo] for any A. 
We have the following lemmas. 


Lemma 1. If AC A’, v € RAȚ] implies v € Ra [A]. 


Lemma 2. x € Ra|Bo] for any A such that A(x) = o. 


Lemma 3. For any o and A, True, False € RA|BBool] and [] € Ra[B[o]]. 


Lemma 4. If Eı E€ Ral[Bo] and E2 E Ra[Blol], then Ey: E2 E Ra[Blo]]. 


Lemma 5. Leto and T be pure types and A a pure type environment. Suppose 
that ei E€ Eawa,|T] for Ai pi: o (i= 1,2), and that Eo E€ Ra|Bo], FE), E, € 
Ralt — Bool] and EY, EY € Ralo —>rTt-—> oo]. Then, case Ey of {pi > 
ei with FE; by E! }i=1,2 E€ Ra[Br]. 
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Proof (Sketch). The proof itself is straightforward by case analysis. The key prop- 
erty is that get and put use the same branches in both proofs of x-Acceptability 
and x-Consistency. Slight care is required for unidirectional evaluations of e; 
and e2, and applications of EF), £5, EY and EY. However, the semantics is care- 
fully designed so that in the proof of x-Acceptability, unidirectional evalua- 
tions that happen in put have already happened in the evaluation of get, and a 
similar discussion applies to x-Consistency. 


As aremark, recall that we assumed a-renaming of p; so that the disjoint unions 
(W) in Fig.6 succeed. This renaming depends on the ps received in get and put 
evaluations, and can be realised by using de Bruijn levels. 


Lemma 6 (Fundamental Lemma). For T; At e: A, for any A’ with A C A’ 
and Ez E Ra [I (x)], we have e[E,/z]z € Ex [A]. 


Proof (Sketch). We prove the lemma by induction on typing derivation. For 
bidirectional constructs, we just apply the above lemmas appropriately. The 
other parts are rather routine. 


Now we are ready to state the correctness of our construction of lenses. 


Corollary 1. If£;eF e: Bo — Br, then e x € Etz:o} [Br]. 


Lemma 7. Ife € €¢,.,} [Bz], Lle] (if defined) is in Lens=*° [o] [7] (and thus 
well-behaved by Theorem 1). 


Theorem 2. Ife; e: Bo — Br, then Lie] € Lens [o] [r] (if defined) is well- 
behaved. 


5 Extensions 


Before presenting a larger example, we discuss a few extensions of HOBiT Core 
which facilitate programming. 


5.1 In-Language Lens Definition 


In HOBiT programming, it is still sometimes useful to allow manually defined 
primitive lenses (i.e., lenses constructed from independently specified get /put 
functions), for backwards compatibility and also for programs with relatively 
simple computation logic but complicated backwards behaviours. This feature 
is supported by the construct appLens e; e2 e3 in HOBiT. For example, we 
can write incB x = appLens (As.s +1) (A_.Av.v — 1) x to define a bidirectional 
increment function incB:: BInt — BInt. Note that for simplicity we require the 
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additional expression x (represented by e3 in the general case) to convert between 
normal functions and lenses. The typing rule for appLens e1 e2 e3 is as below. 


T;AFe:o7r Il;AFe:o—>r>o T; AF e: Bo 
I’; AF appLens e; e2 e3 : Br 


Accordingly, we add the following unidirectional evaluation rule. 
ei Ju E; (i= 1,2,3) 
appLens e; e2 e3 {Juy appLens FE, E2 Es 


Also, we add the following get/put evaluation rules for appLens. 


uke Ez >v Eyvilyu pre Ez >v Evu yv uFpv = B34 p 
ute appLens E; E> £3 > u u Fp u’ 4 appLens E; Ez E34 p 


Notice that appLens e1 e2 e3 is “good” if e3 is so, i.e., appLens e1 e2 e3 € 
E,|Br] if es € Ea [Bo], provided that the get/put pair (e1, e2) is well-behaved. 


5.2 Lens Combinators as Language Constructs 


In this paper, we have focused on the case construct, which is inspired by the 
cond combinator [7]. Although cond is certainly an important lens combina- 
tor, it is not the only one worth considering. Actually, we can obtain language 
constructs from a number of lens combinators including those that take care 
of alignment [2]. For the sake of demonstration, we outline the derivation of a 
simpler example comb € Lens |o] [7] — Lens [o’] [r]. As the construction 
depends solely on types, we purposely leave the combinator abstract. 

A naive way of lifting combinators can already be found in [21,23]. For exam- 
ple, for comb, we might prepare the construct comb,,,, with the following typing 
rule (where € is the empty environment): 


cg;ekF e: Bo — Br T;AF e: Br’ 
T; A F comb,,4 € e : Br’ 


Notice that in this version e is required to be closed so that we can turn the 
function directly into a lens by £[—], and the evaluation of comb, aq can then be 
based on standard lens composition: £o[comb,,,, E E"] = comb LIE] ê LofE’] 
(we omit the straightforward concrete evaluation rules), where Æ and E’ is the 
unidirectional evaluation results of e and e’ (notice that a residual expression is 
also an expression), and 6 is the lens composition combinator [7] defined by: 


(6) € Lens BC — Lens A B > Lens AC 
get (C264,)a = get lə (get 4 a) 
put (L> ô l1) ac = put 4 a (put lz (get 4 a) c) 


The combinator preserves <-well-behavedness, and thus comb,,,, guarantees 
correctness. However, as discussed extensively in the case of case, this “closed- 
ness” requirements prevents flexible use of variables and creates a major obstacle 
in programming. 
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So instead of the plain comb, we shall assume a parameterised version 
pcomb € Lens (T x fo]) [r] — Lens (T x [o’]) [7] that allows each source 
to have an extra component T, which is expected to be kept track of by the 
combinator without modification. Here T is assumed to have a partial merging 
operator (Y) € T — T > T and a minimum element, and pcomb may use these 
facts in its definition. By using pcomb, we can give a corresponding language 
construct comb with a binder, typed as follows. 


T;A,c:oFe:Br T;AF e: Bo’ 
T; At comb (z.e) e : Br’ 


We give its unidirectional evaluation rule as 


e Ju E œ Ju E' 
comb (z.e) e’ Juy comb E E’ 


We omit the get/put evaluation rules, which are straightforwardly obtained from 
the following equation. 


Lo[comb E E'] = pcomb (unEnv, LofE]) 6 (idL, Lol E'J) 


where unEnv, € Lens (JA w {x: o}]) [r] —> Lens ({A] x [o]) [7] and (-, —) € 
Lens [A] A — Lens [A] B — Lens [A] (A x B) are lens combinators defined 
for any A as: 


get (unEnv, £) (u,v) = get (pW {a = v}) 
put (unEnv, £) (u,v) u = (w, v") 
where pi’ W {x = v'} = (put £ (uW {x = v}) v) <a{x = v} 


get (li, l2) u = (get 1 u, get C2 p) 
put (b, 2) u (a,b) = put & way put bo wb 


Both combinators preserve ~<-well-behavedness, where we assume the 
component-wise ordering on pairs. No “closedness” requirement is imposed on 
e in this version. From the construct, we can construct a higher-order function 
Af.Az.comb (x.f x) z : (Bo — Br) — Bo’ Br’. That is, in HOBiT, lens 
combinators are just higher-order functions, as long as they permit the above- 
mentioned parameterisation. This observation means that we are able to system- 
atically derive language constructs from lens combinators; as a matter of fact, 
the semantics of case is derived from a variant of the cond combinator [7]. 

Even better, the parametrised pcomb can be systematically constructed from 
the definition of comb. For comb, it is typical that get (comb £) only uses get £, 
and put (comb £) uses put £; that is, comb essentially consists of two functions 
of types ([o] > [7]) > ([o’] > IFT and (fol > [7] > Tol) > (oT > Ir] 
[o’]). Then, we can obtain pcomb of the above type merely by “monad” ifying the 
two functions: using the reader monad T — — for the former and the composition 
of the reader and writer monads T — (—,T) backwards for the latter suffice to 
construct pcomb. 
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A remaining issue is to ensure that pcomb preserves ~<-well-behavedness, 
which ensures comb (z.e) e’ € €&,[Br’] under the assumptions e € 
Eaw{e:c} Br] and e’ € E,[Bo’]. Currently, such a proof has to be done manu- 
ally, even though comb preserves well-behavedness and pcomb is systematically 
constructed. Whether we can lift the correctness proof for comb to pcomb in a 
systematic way will be an interesting future exploration. 


5.3 Guards 


Guards used for branching are merely syntactic sugar in ordinary unidirectional 
languages such as Haskell. But interestingly, they actually increase the expressive 
power of HOBiT, by enabling inspection of updatable values without making the 
inspection functions bidirectional. 

For example, Gliick and Kawabe’s reversible equivalence check [10] can be 
implemented in HOBIT as follows. 


eqCheck :: Bo — Bo > B( Either (0,0) o) 
eqCheck x y = case (x,y) of 


(x,y) | a’ == y > Right 2’ with isRight by (A_.A(Right z).(x, x)) 
(x', y") | otherwise — Left (x', y') with isLeft by (A_.A(Left (x, y)).(z,y)) 


Here, (—,—) is the bidirectional version of the pair constructor. The exit con- 
dition isRight checks whether a value is headed by the constructor Right, and 
isLeft by Left. Notice that the backwards transformation of eqgCheck fails when 
the updated view is Left (v, v) for some v. 


5.4 Syntax Sugar for Reconciliation Functions 


In the general form, reconciliation functions take in two arguments for the com- 
putation of the new source. But as we have seen, very often the arguments are 
not used in the definition and therefore redundant. This observation motivates 
the following syntax sugar. 


p —> e with e’ default {x, = e1; ...; En = el} 
Here, £1,..., £n are the free variables in p. This syntax sugar is translated as: 


p — e with e' by à—.à—ple] /z1,.-., €n /En] 


Furthermore, it is also possible to automatically derive some default values 
from their types. This idea can be effectively implemented if we extend HOBiT 
with type classes. 
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5.5 Inference of Exit Conditions 


It is possible to infer exit conditions from their surrounding contexts; an idea 
that has been studied in the literature of invertible programming [11,20], and 
may benefit from range analysis. 

Our prototype implementation adopts a very simple inference that constructs 
an exit condition Ax.case x of {pe — True; — — False} for each branch, where pe 
is the skeleton of the branch body e, constructed by replacing bidirectional con- 
structors with the unidirectional counterparts, and non-constructor expressions 
with _. For example, from a: appendB x’ y, we obtain the pattern _ : _. This 
embarrassingly simple inference has proven to be handy for developing larger 
HOBiT programs as we will see in Sect. 6. 


6 An Involved Example: Desugaring 


In this section, we demonstrate the programmability of HOBiT using the exam- 
ple of bidirectional desugaring [26]. Desugaring is a standard process for most 
programming languages, and making it bidirectional allows information in desug- 
ared form to be propagated back to the surface programs. It is argued convinc- 
ingly in [26] that such bidirectional propagation (coined resugaring) is effective 
in mapping reduction sequences of desugared programs into those of the surface 
programs. 

Let us consider a small programming language that consists of let, if, 
Boolean constants, and predefined operators. 


data E = ELet E E | EVar Int | Elf E E E | ETrue| EFalse | EOp Name [EF] 
type Name = String 


Variables are represented as de Bruijn indices. 
Some operators in this language are syntactic sugar. For example, we may 
want to desugar 


EOp "not" [e] as Elf e EFalse ETrue. 


Also, e; || e2 can be transformed to let x = e1 in if x then z else e2, which in 
our mini-language is the following. 


EOp "or" [e1, e2] as ELet e; (Elf (EVar 0) (EVar 0) (shift 0 e2) 


Here, shift n is the standard shifting operator for de Brujin indexed-term that 
increments the variables that have indices greater than n (these variables are 
“free” in the given expression). We will program a bidirectional version of the 
above desugaring process in Figs.7 and 8, with the particular goal of keeping 
the result of a backward execution as close as possible to the original sugared 
form (so that it is not merely a “decompilation” in the sense that the original 
source has to be consulted). 
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composB :: (BE + BE) > BE > BE recE :: E> E > E 
composB f x = case x of recE e (Elf — — —) = Elf ETrue e e 
Elf e1 e2 e3 > Elf (f e1) (f e2) (f e3) by recE recE e (ELet — —) = ELet e (shift 0 e) 
ELet e1 eg — ELet (f e1) (f e2) by recE recE e (EOpn_) = toOp ne 
EVar n > EVarn by recE recE ee! =e’ 
ETrue > ETrue toOp :: Name > E > E 
EFalse — EFalse toOpne= 
EOpnes — EOp n (mapB ETrue f es) by reck let k = fromJust (lookup n arities) 
mapB :: a + (Ba > Bb) > Bia] > Bid] in EOp (replicate k e) 
mapB def z = case z of 
J >11 


-T —> fa:mapB def x default {a = def; x = []} 


Fig. 7. composB: a useful building block 


shiftB :: Int > BE > BE 
shiftB n e = case e of 


ELet e1 e2 — ELet (shiftB n e1) (shiftB (n +1) e2) default {e; = ETrue; e2 = EFalse} 
EVar m | m < n > EVar m with varLT n default m = 0 

EVar m | m > n > EVar (incB m) with varGT n default m=n+1 

e — composB (shiftB n) e' with nonLetVar by recE 


desugarB :: BE > BE 
desugarB e = case e of 
EOp "or" [e1, e2] —> ELet (desugarB e1) (Elf (EVar 0) (EVar 0) (desugarB (shiftB 0 e2))) 
by (As.A_.toOp "or" s) 
EOp "not" [e] — Elf e EFalse ETrue by (As.A_.toOp "not" s) 


e — composB desugarB e' by recE 
varLT n (EVar m) =m<n nonLetVar (ELet — —) = False 
varLTn— = False nonLetVar (EVar —) = False 
varGT n (EVar m) =m>n nonLet Var e = True 
varGT n — = False 


Fig. 8. desugarB: bidirectional desugring 


We start with an auxiliary function compos [4] in Fig.7, which is a use- 
ful building block for defining shifting and desugaring. We have omitted the 
straightforward exit conditions; they will be inferred as explained in Sect. 5.5. 
The function mapB is the bidirectional map. The reconciliation function recE 
tries to preserves as much source structure as possible by reusing the origi- 
nal source e. Here, arities :: |( Name, Int)| maps operator names to their ari- 
ties (i.e. arities = [("or",2),("not",1)]). The function shift is the standard 
uni-directional shifting function. We omit its definition as it is similar to the 
bidirectional version in Fig.8. Note that default is syntactic sugar for reconcili- 
ation function introduced in Sect.5.4. Here, incB is the bidirectional increment 
function defined in Sect.5.1. Thanks to composB, we only need to define the 
interesting parts in the definitions of shiftB and desugarB. The reconciliation 
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functions recE and toOp try to keep as much source information as possible, 
which enables the behaviour that the backwards execution produces “not” and 
“or” in the sugared form only if the original expression has the sugar. 

Consider a sugared expression EOp "or" [EOp "not" [ETrue],EOp "not" 
[EFalse]] as a source source. 


HOBiT> :get desugarB source 
ELet (Elf ETrue EFalse ETrue) (Elf (EVar 0) (EVar 0) (Elf EFalse EFalse ETrue) 
{- let x = (if True then False else True) 

in if x then z else (if False then False else True) -} 


The following updated views may be obtained by reductions from the view. 


{- view: = let x = False in if x then z else (if False then False else True) -} 
view, = ELet EFalse (Elf (EVar 0) (EVar 0) (Elf EFalse EFalse ETrue) 


{- view2 = if False then False else (if False then False else True) -} 
view2 = Elf EFalse EFalse (Elf EFalse EFalse ETrue) 


{- view3 = if False then False else True -} 
view3 = Elf EFalse EFalse ETrue 


The following are the corresponding backward transformation results. 


HOBiT> :put desugarB source view. 
EOp "or" [EFalse, EOp "not" [EFalse]] 
HOBiT> :put desugarB source view2 
Elf EFalse EFalse (EOp "not" [EFalse] 
HOBiT> :put desugarB source views 
EOp "not" [False] 


As the AST structure of the view is changed, all of the three cases require branch- 
switching in the backwards executions; our program handles it with ease. For 
view2, the top-level expression Elf EFalse EFalse ... does not have a corresponding 
sugared form. Our program keeps the top level unchanged, and proceeds to the 
subexpression with correct resugaring, a behaviour enabled by the appropriate 
use of reconciliation function (the first line of recE for this particular case) in 
composB. 

If we were to present the above results as the evaluation steps in the surface 
language, one may argue that the second result above does not correspond to 
a valid evaluation step in the surface language. In [26], AST nodes introduced 
in desugaring are marked with the information of the original sugared syntax, 
and resugaring results containing the marked nodes will be skipped, as they do 
not correspond to any reduction step in the surface language. The marking also 
makes the backwards behaviour more predictable and stable for drastic changes 
on the view, as the desugaring becomes injective with this change. This technique 
is orthogonal to our exploration here, and may be combined with our approach. 
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7 Related Work 


Controlling Backwards Behaviour. In addition to put € S — V — S, many lens 
languages [3] supply a create E€ V — S (which is in essence a right-inverse of 
get) to be used when the original source data is unavailable. This happens when 
new data is inserted in the view, which does not have any corresponding source 
for put to execute, or when branch-switching happens but with no reconciliation 
function available. Being a right-inverse, create does not fail (assuming it ter- 
minates), but since it is not guided by the original source, the results are more 
arbitrary. We do not include create in HOBiT, as it complicates the system 
without offering obvious benefits. Our branch-switching facilities are perfectly 
capable of handling missing source data via reconciliation functions. 

Using exit conditions in branching constructs for backwards evaluation can 
be found in a number of related fields: bidirectional transformation [7], reversible 
computation [34] and program inversion [11,20]. Our design of case is inspired by 
the cond combinator in the lens framework [7] and the if-statement in Janus [34]. 
A similar combinator is Case in BiGUL [16], where a branch has a function 
performing a similar role as an exit condition, but taking the original source in 
addition. This difference makes Case more expressive than cond; for example, 
Case can implement matching lenses [2]. Our design of case follows cond for its 
relative simplicity, but the same underlying technique can be applied to Case 
as mentioned in Sect. 5.2. In the context of bidirectionalization [19,29,30] there 
is the idea of “Plug-ins” [31] that are similar to reconciliation functions in the 
sense that source values can be adapted to direct backwards execution. 


Applicative Lenses. The applicative lens framework [21,23] provides a way to use 
A-abstraction and function application as in normal functional programming to 
compose lenses. Note that this use of “applicative” refers to the classical applica- 
tive (functional) programming style, and is not directly related to Applicative 
functor in Haskell. In this sense, it shares a similar goal to us. But crucially, applica- 
tive lens lacks HOBiT’s ability to allow \-bound variables to be used freely, and as 
aresult suffers from the same limitation of lens languages. There are also a couple 
of technical differences between applicative lens and our work: applicative lens is 
based on Yoneda embedding while ours is based on separating I’ and A and hav- 
ing three semantics (Sect. 4); and applicative lens is implemented as an embedded 
DSL, while HOBiT is given as a standalone language. Embedded implementation 
of HOBiT is possible, but a type-correct embedding would expose the handling of 
environment A to programmers, which is undesirable. 


Lenses and Their Extensions. As mentioned in Sect. 1, the most common way 
to construct lenses is by using combinators [3,7,8], in which lenses are treated 
as opaque objects and composed by using lens combinators. Our goal in this 
paper is to enhance the programmability of lens programming, while keeping its 
expressive power as possible. In HOBiT, primitive lenses can be represented as 
functions on B-typed values (Sect.5.1), and lens combinators satisfying certain 
conditions can be represented as language construct with binders (Sect. 5.2), 
which is at least enough to express the original lenses in [7]. 
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Among extensions of the lens language [2,3,7—9,16,17,27,32], there exists a 
few that extend the classical lens model [7], namely quotient lenses [8], symmetric 
lenses [14], and edit-based lenses [15]. A natural question to ask is whether our 
development, which is based on the classical lenses, can be extended to them. 
The answer depends on treatment of value environments p in get and put. In 
our semantics, we assume a non-linear system as we can use the same variable 
in u any number of times. This requires us to extend the classical lens to allow 
merging (Y) and defaulting (<) operations in put with <-well-behavedness, but 
makes the syntax and type system of HOBiT simple, and HOBiT free from 
the design issues of linear programming languages [25]. Such extension of lenses 
would be applicable to some kinds of lens models, including quotient lenses and 
symmetric lenses, but its applicability is not clear in general. Also, we want to 
mention that allowing duplications in bidirectional transformation is still open, 
as it essentially entails multiple views and the synchronization among them. 


8 Conclusion 


We have designed HOBiT, a higher-order bidirectional programming language in 
which lenses are represented as functions and lens combinators are represented 
as language constructs with binders. The main advantage of HOBiT is that users 
can program in a style similar to conventional functional programming, while still 
enjoying the benefits of lenses (i.e., the expressive power and well-behavedness 
guarantee). This has allowed us to program realistic examples with relative ease. 
HOBiT for the first time introduces a truly “functional” way of construct- 
ing bidirectional programs, which opens up a new area of future explorations. 
Particularly, we have just started to look at programming techniques in HOBiT. 
Moreover, given the resemblance of HOBiT code to that in conventional lan- 
guages, the application of existing programming tools becomes plausible. 
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Abstract. We characterize the relation between generalized algebraic 
datatypes (GADTs) with pattern matching on their constructors one 
hand, and generalized algebraic co-datatypes (GAcoDTs) with copattern 
matching on their destructors on the other hand: GADTs can be con- 
verted mechanically to GAcoDTs by refunctionalization, GAcoDTs can 
be converted mechanically to GADTs by defunctionalization, and both 
defunctionalization and refunctionalization correspond to a transposition 
of the matrix in which the equations for each constructor/destructor pair 
of the (co-)datatype are organized. We have defined a calculus, GADT’, 
which unifies GADTs and GAcoDTs in such a way that GADTs and 
GAcoDTs are merely different ways to partition the program. 

We have formalized the type system and operational semantics of 
GADT™ in the Coq proof assistant and have mechanically verified the 
following results: (1) The type system of GADT™ is sound, (2) defunc- 
tionalization and refunctionalization can translate GADTs to GAcoDTs 
and back, (3) both transformations are type- and semantics-preserving 
and are inverses of each other, (4) (co-)datatypes can be represented by 
matrices in such a way the aforementioned transformations correspond 
to matrix transposition, (5) GADTs are extensible in an exactly dual way 
to GAcoDTs; we thereby clarify folklore knowledge about the “expres- 
sion problem”. 

We believe that the identification of this relationship can guide future 
language design of “dual features” for data and codata. 


1 Introduction 


The duality between data and codata, between construction and destruction, 
between smallest and largest fixed points, is a long-standing topic in the PL 
community. While some languages, such as Haskell, do not distinguish explicitly 
between data and codata, there has been a “growing consensus” [1] that the two 
should not be mixed up. Many ideas that are well-known from the data world 
have counterparts in the codata world. One work that is particularly relevant 
for this paper are copatterns, also proposed by Abel et al. [1]. Using copatterns, 
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the language support for codata is very symmetrical to that for data: Data 
types are defined in terms of constructors, functions consuming data are defined 
using pattern matching on constructors; codata types are defined in terms of 
destructors, functions producing codata are defined using copattern matching 
on destructors. 

Another example of designing dual features for codata is the recently pro- 
posed codata version of inductive data types [36]. However, coming up with these 
counterparts requires ingenuity. The overarching goal of this work is to replace 
the required ingenuity by a mechanical derivation. A key idea towards this goal 
has been proposed by Rendel et al. [31], namely to relate the data and codata 
worlds by refunctionalization [16] and defunctionalization [17,32]. 

Defunctionalization is a global program transformation to transform higher- 
order programs into first-order programs. By defunctionalizing a program, 
higher-order function types are replaced by sum types with one variant per func- 
tion that exists in the program. For instance, if a program contains two functions 
of type Nat — Nat, then these functions are represented by a sum type with 
two variants, one for each function, whereby the type components of each variant 
store the content of the free variables that show up in the function definition. 
Defunctionalized function calls become calls to a special first-order apply func- 
tion which pattern-matches on the aforementioned sum type to dispatch the call 
to the right function body. 

Refunctionalization is the inverse transformation, but traditionally it only 
works (easily) on programs that are in the image of defunctionalization [16]. In 
particular, it is not clear how to refunctionalize programs when there is more 
than one function (like apply) that pattern-matches on the same data type. 
Rendel et al. [31] have shown that this problem goes away when functions are 
generalized to arbitrary codata (with functions being the special codata type 
with only one apply destructor), because then every pattern-matching function 
in a program to be refunctionalized can be expressed as another destructor. 

The main goal of this work is to extend the de- and refunctionalization corre- 
spondence between data and codata to generalized algebraic datatypes (GADTs) 
[8,40] and their codata counterpart, which we call Generalized Algebraic Codata 
types (GAcoDTs). More concretely, this paper makes the following contributions. 


— We present the syntax, operational semantics, and type system of a language, 
GADT’, that can express both GADTs and GAcoDTs. In this language, 
GADTs and GAcoDTs are unified in such a way that they are merely two 
different representations of an abstract “matrix” interface. 

— We show that the type system is sound by proving progress and preservation 
[39]. 

— We formally define defunctionalization and refunctionalization, observe that 
they correspond to matrix transposition, and prove that GADTs and 
GAcoDTs are indistinguishable after hiding them behind the aforementioned 
matrix interface. We conclude that defunctionalization and refunctionalization 
preserve both operational semantics and typing. 
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— We prove that both GADTs and GAcoDTs can be extended in a modular way 
(with separate type checking) by “adding rows” to the corresponding matrix. 
Due to their matrix transposition relation, this means that the extensibil- 
ity is exactly dual, which clarifies earlier informal results on the “expression 
problem” [11,33,37]. 

— The language and all results have been formalized and mechanically verified 
in the Coq proof assistant. The Coq sources are available in the supplemental 
material that accompanies this submission. 

— As a small side contribution, if one considers only the GADT part of the 
language, this is to the best of our knowledge the first mechanically verified 
formalization of GADTs. It is also simpler than previous formalizations of 
GADTs because it is explicitly typed and hence avoids the complications of 
type inference. 


The remainder of this paper is structured as follows. In Sect.2 we give 
an informal overview of our main contributions by means of an example and 
using conventional concrete syntax. In Sect. 3 we present the syntax, operational 
semantics, and type system of GADTT. Section 4 presents the aforementioned 
mechanically verified properties of GADT’. In Sect.5, we discuss applications 
and limitations of GADT’, talk about termination/productivity and directions 
for future work, and describe how we formalized GADT™ in Coq. Finally, Sect. 6 
discusses related work and Sect. 7 concludes. 


2 Informal Overview 


Figure 1 illustrates the language design of GADT™ in terms of an example. 
The left-hand side shows an example using GADTs and functions that pattern- 
match on GADT constructors. The right-hand side shows the same example 
using GAcoDTs and functions that copattern-match on GAcoDT destructors. 
The right-hand side is the refunctionalization of the left hand side; the left-hand 
side is the defunctionalization of the right-hand side. 


Simply-Typed (Co)Datatypes. Let us first look at the Nat (co)datatype. Every 
data or codata type has an arity: The number of type arguments it receives. Since 
GADT" does only feature types of kind *, we simply state the number of type 
arguments in the (co)data type declaration. Nat receives zero type arguments, 
hence Nat illustrates the simply-typed setting with no type parameters. Func- 
tions in GADTT , like add on the left-hand side, are first-order only; higher-order 
functions can be encoded as codata instead. Functions always (co)pattern-match 
on their first argument. (Co)pattern matching on multiple argument as well as 
nested and deep (co)pattern matching are not supported directly and must be 
encoded via auxiliary functions. We see that the refunctionalized version of Nat 
on the right-hand side turns constructors into functions, functions into destruc- 
tors, and pattern matching into copattern matching. Abel et al. [1] use “dot 
notation” for copattern matching and destructor application; for instance, they 
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data Nat[0] where 
zero(): Nat 
succ(Nat): Nat 


function add(Nat,Nat): Nat where 
add(zero(), x) =x 
add(succ(y),x) = succ(add(y,x)) 


data List[1] where 
nil[A](): List[A] 
cons[A] (A, List[A]): List[A] 


function length[A] (List[A]): Nat w.. 
length[_] (mil[_]) = 0 
length[B] (cons[_] (x,xs)) = 
succ(length[B] (xs) ) 


function sum(List[Nat]): Nat 
sum(nil[_]) = 0 
sum(cons[_] (x,xs)) = x + sum(xs) 


data Tree[1] where 
node(Nat): Tree[Nat] 
branch [A] (List [Tree [A]]) 
: Tree[List [A]] 


function unwrap(Tree[Nat]): Nat w.. 
unwrap(node(n)) =n 
unwrap (branch[_](xs)) = impossible 


function width[A] (Tree[A]): Nat w.. 
width[_] (node(n)) = 0 
width[_] (branch[C] (xs)) = 
length [C] (xs) 


codata Nat[0] where 
add(Nat,Nat) : Nat 


function zero(): Nat where 
add(zero(),x) = x 


function succ(Nat): Nat where 
add(succ(y),x) = succ(add(y,x)) 


codata List[1] where 
length[A] (List [A]): Nat 
sum(List[Nat]): Nat 


function nil[A](): List[A] where 
length[_](nil[_]) = 0 
sum(nil[_]) = 0 
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function cons[A](A, List[A]): List[A] w.. 


length[B] (cons[_] (x,xs)) = 
succ(length[B] (xs) ) 
sum(cons[_](x,xs)) = x + sum(xs) 


codata Tree[1] where 
unwrap(Tree[Nat]) : Nat 
width[A] (Tree[A]): Nat 


function node(Nat): Tree[Nat] where 
unwrap(node(n)) = 
width[_] (node(n)) 


I B 


0 


function branch[A] (List [Tree[A]]) 

: Tree [List[A]] where 
unwrap(branch[_](xs)) = impossible 
width[_] (branch[C] (xs)) = 

length[C] (xs) 


Fig. 1. The same example in the data fragment (left) and codata fragment (right) 


List [1] 


nil[A](): List [A] 


cons[A] (A, List[A]): List[A] 


length[A] (List [A]): Nat/length[_] (mil[_]) = 0 


length[B] (cons[_] (x,xs)) = 
succ (length [B] (xs) ) 


sum(List[Nat]): Nat 


sum(nil[_]) = 


(0) sum(cons[_](x,xs)) = x + sum(xs) 


Fig. 2. Matrix representation of List GADT from Fig. 1 (left) 


List [1] 


length[A] (List [A]): Nat 


sum(List [Nat]): Nat | 


nil[A](): List [A] 


length[_](nil[_]) = 0 


sum(nil[_]) = 0 


cons[A](A, List[A]): List[A] 


length[B] (cons[_] (x,xs)) = 
succ (length [B] (xs) ) 


x + sum(xs) 


sum(cons[_] (x,xs)) = 


Fig. 3. Matrix representation of List GAcoDT from Fig. 1 (right). This matrix is the 


transposition of Fig. 2. 
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would write succ (y) .add(x) = succ(y.add(x)) instead of add(succ(y),x) = 
succ(add(y,x)) on the right-hand side of Fig. 1. We use the same syntax for 
constructor calls, function calls, and destructor calls because then the equations 
are not affected by de- and refunctionalization. 


Parametric (Co)Datatypes. The List datatype illustrates the classical special 
case of GADTs with no indexing. Type arguments of constructors, functions, and 
destructors are both declared and passed via rectangular brackets [...] (loosely 
like in Scala). Like System F, GADT™ has no type inference; all type annotations 
and type applications must be given explicitly. GADT? has a redundant way of 
binding type parameters. When defining an equation of a polymorphic function 
with a polymorphic first argument, we use square brackets to bind both the 
type parameters of the function and of the constructor/destructor on which we 
(co)pattern-match. For instance, in the equation length [B] (cons[_] (x,xs)) = 

. on the left hand side, B is the type parameter of the length function, 
whereas the underscore (which we use if the type argument is not relevant, 
we could replace it by a proper type variable name) binds the type argument 
of the constructor with which the list was created. In this example, we could 
have also written the equation as length[_] (cons[B] (x,xs)) = ... because 
both type parameters must necessarily be the same, but in the general case we 
need access to both sets of type variables (as the next example will illustrate). 
It is important that we do not (co)pattern-match on type arguments, since this 
would destroy parametricity; rather, the [...] notation on the left hand side of 
an equation is only a binding construct for type variables. 

Codatatypes also serve as a generalization of first-class functions. The code 
below shows how a definition of a general function type together with a spe- 
cific family of first-class function addn (that can be passed as an argument and 
returned as a result), defined by a codata generator function with return type 
Function[Nat,Nat]. 


codata Function[2] where 
apply [A,B] (Function[A,B], A): B 


function addn(Nat): Function[Nat,Nat] where 
apply(addn(n),m) = add(n,m) 


Type Parameter Binding. Of those two sets of type parameter bindings, the one 
for functions is in a way always redundant because we could use the type variable 
declaration inside the function declaration instead. For instance, in the equation 
length [B] (cons[_] (x,xs)) = succ(length[B] (xs)) on the left hand side we 
could use the type parameter A of the enclosing function declaration instead. 
However, in GADT™ the scope of the type variables in the function declaration 
does not extend to the equations and the type arguments must be bound anew 
in every equation. The reason for that is that we want to design the equations 
in such a way that they do not need to be touched when de/refunctionalizing 
a (co)datatype. For instance, when refunctionalizing a datatype, a function 
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declaration is turned into a destructor declaration and what used to be a type 
argument that was bound in the enclosing function declaration becomes a type 
argument that is bound in a remote destructor declaration; to make type- 
checking modular we hence need a local binding construct. Our main goal in 
designing GADT™ was not to make it convenient for programmers but to make 
the relation between GADT's and GAcoDTs as simple as possible; furthermore, 
a less verbose surface syntax could easily be added on top. 

If we look at the corresponding List codatatype on the right-hand side, 
we see that the sum function from the left-hand side, which accepts only a list 
of numbers, turns into a destructor that is only applicable to those instances 
of List whose type parameter is Nat. This is similar to methods in object- 
oriented programming whose availability depends on type parameters [28], but 
here we see that this feature arises “mechanically” by the de/refunctionalization 
correspondence. 


GA(co)DTs. The Tree (co)datatype illustrates a usage of GA(co)DTs that can- 
not be expressed with traditional parametric data types. We can see that by 
looking at the return type of the constructors of the Tree datatype; they are 
Tree [Nat] and Tree[List[A]] instead of Tree [A]. The Tree codatatype is also 
using the power of GAcoDTs in the unwrap destructor! because its first argu- 
ment is different from Tree [A]. The GADT constructor node (Nat): Tree[Nat] 
turns into a function that returns a Tree [Nat] on the right hand side. The Tree 
example illustrates two additional issues that did not show up in the earlier 
examples. 

First, it illustrates that type unification may make some pattern matches 
impossible, as illustrated by the unwrap(branch[_] (xs)) = impossible equa- 
tion on the left hand side. The equation is impossible, because the function 
argument type Tree[Nat] cannot be unified with the constructor return type 
Tree [List [A]].? In GADTT, we require that pattern matching is always com- 
plete, but impossible equations are not type-checked; the right-hand side can 
hence be filled with any dummy term. Second, the equation width [_] (branch [cC] 
(xs)) = length[C] (xs) illustrates the case where it is essential that we can 
bind constructor type arguments; otherwise we would have no name for the type 
argument we need to pass to length. Such type arguments are sometimes called 
existential or phantom [8] because if we have a branch of type Tree[A], we only 
know that there exists some type that was used in the invocation of the branch 
constructor, but that type does not show up in the structure of Tree[A]. 

We see again how both impossible equations and the need to access construc- 
tor type arguments translate naturally into corresponding features in the codata 
world. For impossible equations, we need to check whether the first destructor 
argument type can be unified with the function return type. Access to existential 


1 The unwrap destructor is meant to be used to extract the number from a tree that 
directly contains a number, i.e., a tree constructed with constructor node. 

? This fits with our intention that unwrap should only work on a node (which directly 
contains a number). 
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constructor type arguments turns into access to local function types; conversely, 
access to existential destructor type arguments in the codata world turns into 
access to local function type arguments. 


GADT = GAcoDT". We can see that the relation between GADTs and 
GAcoDTs is as promised when looking at Figs. 2 and 3. These two figures show 
a slightly different representation of the List (co)datatype and associated func- 
tions from Fig. 1. In this presentation, we have dropped all keywords from the 
language, such as function, data and codata. The reason for dropping these 
keywords is that now function signatures in the data fragment look the same 
as destructor signatures in the codata fragment, and constructor signatures in 
the data fragment look the same as function signatures in the codata fragment. 
Figure 2 organizes the datatype in the form of a matrix: the first row lists the 
datatype and its constructor signatures, the first column lists the signatures 
of the functions that pattern-match on the datatype, the inner cells represent 
the equations for each combination of constructor and function. Figure3 does 
the same for the List codatatype: The first row lists the codatatype and its 
destructor signatures, the first column lists the signatures of functions that 
copattern-match on the codatatype, the inner cells represent the equations for 
each combination of function and destructor. We can now see that the relation 
between GADTs and GAcoDTs is now indeed rather simple: It is just matrix 
transposition. 

An essential property of this transformation is that other (co)datatypes and 
functions are completely unaffected by the transformation. For instance, the Tree 
datatype (or codatatype, regardless of which version we use) looks the same, 
regardless of whether we encode List in data or in codata style. Defunctional- 
ization and refunctionalization are still global transformations in that we need 
to find all functions that pattern-match on a datatype (for refunctionalization) 
or find all functions that copattern-match on a codatatype (for defunctionaliza- 
tion), but the rest of the program, including all clients of those (co)datatypes 
and functions, remain the same. 


Infinite Codata, Termination, Productivity. The semantics of codata is usually 
defined via greatest fixed point constructions that include the possibility to rep- 
resent “infinite” structures, such as streams. This is not the focus of this work, 
but since our examples so far did not feature such “infinite” structures but we 
do not want to give the impression that our codata types do somehow lack the 
expressiveness to express streams and the like, hence we show here an example 
of how to encode a stream of zeros, both in the codata representation (left) and, 
defunctionalized, in the data representation (right). 
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data Stream where 
codata Stream where 


head(Stream) : Nat 
tail(Stream) : Stream 


zeros() : Stream 


function head(Stream) : Nat 


function zeros() : Stream head(zeros()) = zero() 


head(zeros()) = zero() 


f ti tail(St : St 
tail(zeros()) = zeros() unction tail(Stream) ERAS 


tail(zeros()) = zeros() 


Codata is also often associated with guarded corecursion to ensure productivity. 
In the copattern formulation of codata, productivity and termination coincide 
[2]. Due to our unified treatment of data and codata, a single check is sufficient 
for both termination/productivity of programs. In Sect. 5.3, we discuss a sim- 
ple syntactic check that corresponds to both structural recursion and guarded 
corecursion. 


Properties of GADTT . In the remainder of this paper, we formalize GADTT in 
a style similar to the matrix representation of (co)datatypes we have just seen. 
We define typing rules and a small-step operational semantics and prove formal 
versions of the following informal theorems: (1) The type system of GADTT 
is sound (progress and preservation), (2) Defunctionalization and refunctional- 
ization (that is, matrix transposition) of (co)datatypes preserves well-typedness 
and operational semantics, (3) Both types of matrices are modularly extensible 
in one dimension, namely by adding more rows to the matrix. This means that 
we can modularly add constructors or destructors and their respective equa- 
tions without breaking type soundness as long as the new equations are sound 
themselves. 


3 Formal Semantics 


We have formalized GADTT and all associated theorems and proofs in Coq’. 
Here we present a traditional representation of the formal syntax using context- 
free grammars, a small-step operational semantics, and a type system. 

We have formalized the language in such a way that we abstract over the 
physical representation of matrices as described in the previous section, hence 
we do not need to distinguish between GADTs and GAcoDTs. In the following, 
we say constructor to denote either a constructor of a datatype, or a function 
that copattern-matches on a codatatype. We say destructor to denote either a 
function that pattern-matches on a datatype, or a destructor of a codatatype. 
The language is defined in terms of constructors and destructors; we will later 
see that GADTs and GAcoDTs are merely different organizations of destructors 
and constructors. 


3.1 Language Design Rationale 


Our main goal in the formalization is to clarify the relation between GADTs 
and GAcoDTs, and not to design a calculus that is convenient to use as a 


3 Full Coq sources are available in the supplemental material. 


68 K. Ostermann and J. Jabs 


programming language. Hence we have left out many standard features of pro- 
gramming calculi that would have made the description of that relation more 
complicated. In particular: 


— Like System F, GADT™ requires explicit type annotations and explicit type 
application. Type inference could be added on top of the calculus, but this is 
not in the scope of this work. 

— (Co)pattern matching is restricted in that every function must necessarily 
(co)pattern-match on its first argument, hence (co)pattern-matching on mul- 
tiple arguments or “deep” (co)pattern matching must be encoded by aux- 
iliary functions. Pattern matching is only supported for top-level function 
definitions; there is no “case” or “match” construct. Functions that are not 
supposed to (co)pattern-match (like the polymorphic identity function) must 
be encoded by a function that (co)pattern-matches on a dummy argument of 
type Unit. 

— First-class functions are supported in the form of codata, but anonymous 
local first-class functions must be encoded via lambda lifting [3,25], that is, 
they must be encoded as top-level functions where the bindings for the free 
variables are passed as an extra parameter. 

— Due to the abstraction over the physical representation of matrices we have 
not fixed the physical modular structure (a linearization of the matrix as 
text) of programs. Type checking of matrices simply iterates over all cells 
in an unspecified order. However, later on we will characterize GADTs and 
GAcoDTs as two physical renderings of matrices and formally prove the way 
in which those program organizations are extensible. 


3.2 Notational Conventions 


As usual, we use the same letters for both non-terminal symbols and meta- 
variables, e.g., t stands both for the non-terminal in the grammar for terms 
but inside inference rules it is a meta-variable that stands for any term. We 
use the notation t to denote a list t1, t2,... , tgp where |t| is the length of the 
list. We also use list notation to denote iteration, e.g., P, + t: T means 
PREF tit Tips BFF tz: Tz To keep the notation readable, we write =: T 
instead of x: T to denote z1 : T),...,2%n : Th. 

We use the notation t[x := t'] to denote the substitution of all free occurrences 
of x in t by t’, and similarly T|X := T'] and t|X := T'] for the substitution of 
type variables in types and terms, respectively. 


3.3 Syntax 


The syntax of GADT7 is defined in Fig. 4. Types have the form m[T], where m is 
the name of a GADT or GAcoDT (in the following referred to as matrix name), 
and square brackets to denote type application. Types can contain type variables 
X. In the syntax of terms t, x denotes parameters that are bound by (co)pattern 


matching and y denotes other parameters. A constructor call c[T](¢) takes zero or 
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Syntax 


S,T := m|T] | X Types 
t =x | y | MO | dre Terms 
C x= c[X](T) :m|T] Constructor Signature 
D = d[X|(m[T],T) : Destructor Signature 
e := dY |(c[X](@), y) =t Equations 
M =(aye0 oe Dye) Matrices 
P = Mfn M Programs 
m € Matrix names 
d € Destructor names 
c € Constructor names 
x € Pattern Variable Names 
Yy € Variable Names 
X,Y € Type Variables 
a E N Arities 
Operational Semantics : P F t > t’ 
u,v i= cT] 0) Values 
E := cT], [,t) | d[T],[], t) Evaluation Contest 
PHt>ť 
D oOo o (E-CTX) 
Pt Eft] > Eft’) 
m = (a, C, D, lookup) € P 
DED D=d...](ml...],...) 
Cec C= E E 
lookup(C, D) = d[Y |(c|X] (©), y) = t (E-Fire) 


P+} d[S\(c{T](¥),u) > tX := S,Y := T][z := v, y := u] 


Fig. 4. Syntax and operational semantics of GADT™ 


more arguments, whereas a destructor call d[T](t, t) takes at least one argument 
(namely the one to be destructed). Both destructors and constructors can have 
type parameters, which must be passed via square brackets. 

A constructor signature c[X](T) : m|T] defines the number and types of 
parameters and the type parameters to the constructed type. Its output type can- 
not be a type variable but must be some concrete matrix type m[T]. A destructor 
signature, on the other hand, must have a concrete matrix type as its first argu- 
ment and can have an arbitrary return type. Equations d[Y](c[X|(Z),y) = t 
define what happens when a constructor c meets a destructor d. The & bind the 
components of the constructor, whereas the y bind the remaining parameters of 


the destructor call. We also bind both the type arguments to the constructor X 
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and the destructor Y, such that they can be used inside t. In many cases, the X 
will provide access to the same types as Y, but in the general case we need both 
because both constructors and destructors may contain phantom types [8]. 

Matrices M are an abstract representation of both GADTs and GAcoDTs, 
together with the functions that pattern-match (for GADTs) or copattern-match 
(for GAcoDTs) on the GA(co)DTs. A matrix has an arity a (the number of type 
parameters it receives), a list of constructors y, and a list of destructors 6. It also 
has a lookup function that returns an equation for every constructor /destructor 
pair on which the matrix is defined (hence the type of matrices is a dependent 
type). There must be an equation for each constructor/destructor pair, but in 
the case of impossible combinations, the equations are not type-checked and 
some dummy term can be inserted. A program P is just a finite mapping from 
matrix names to matrices. 


3.4 Operational Semantics 


We define the operational semantics, also in Fig.4, via an evaluation context 
E, which, together with E-CTx, defines a standard call-by-value left-to-right 
evaluation order. Not surprisingly, the only interesting rule is E-FIRE, which 
defines the reduction behavior when a destructor meets a constructor. We look 
up the corresponding matrix in the program and look up the equation for that 
constructor/destructor pair. In the body of the equation, t, we perform two 
substitutions: (1) We substitute the formal type arguments X and Y by the 
current type arguments S and T, and (2) we substitute the pattern variables 
T by the components V of the constructor and the variables y by the current 
arguments WU. 


3.5 Typing 


The typing and well-formedness rules are defined in Fig.5. Let us first look at 
the typing of terms. The rules for variable lookup are standard. The constructor 
rule T-CoNnstT checks that the number of type- and term arguments matches 
the declaration and checks the type of all arguments, whereby the type variables 
are substituted by the type arguments of the actual constructor call. Construc- 
tor names must be globally unique, hence the matrix to which the constructor 
belongs is not relevant. 

This is different for typing destructor calls (T-DEsT). A destructor is resolved 
by first determining the matrix m of the first destructor argument, and then the 
destructor is looked up in that matrix. It is hence OK if the same destructor 
name shows up in multiple matrices. When considering codata as “objects” like 
in object-oriented programming [24], this corresponds to the familiar situation 
that different classes can define methods with the same name. In the GADT 
case, this corresponds to allowing multiple pattern-matching functions of the 
same name that are disambiguated by the type of their first argument. 

In WF-EQ, we construct the appropriate typing context to type-check the 
right hand side of equations. We allow implicit a-renaming of type variables 
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Term Typing: P,&tt:T 


I ::=e | «:T,0 | y: 7,0 Typing Contezts 


a:Ter PDE t : m[T][X := 5] 
prr ga Pb AOL): Tea EF 
, : ViP, IF ti : Ui[X := 5] 

Yi 2ST (Wa Xi=5 t= P 
es Pr dslt,f) : TX := 5] catia 
n (XIT: T.. EP 
ViP, TF ti: Ti[X := S| 

tl 


Xl=|5|__|T|=| 


P, T H c[S]@) : T[X := 5 (T-Const) 


Well-Formedness 


C = ¢[X’|(T) : m[S] |X| =X 
D= dY AmS: T F1 = 
all-distinct(X, Y ) all-distinct(X', Y’) 
most-general-unifier(m[S], m[S"]) =o 
P, z :o(T),y :o(T') F- o(t|X := X',Y := Y')) :o(T) 


P, mF dl¥|(c[X](@),9) = t OK in C, D 


(WF-EQ) 


C =...: m[S] D=...(m[S"],...) :-- 


most-general-unifier(m[S], m[.$’]) = error 
P,m H d[Y|(c[X](#), y) = t OK in C, D 


(WF-INFSBLE 


(S|=a FVT)CX FV(S)CX 
c[X](T) : m[S] OK in m,a 


(WF-CONSTR 


[S}=a FV(S)CY  FV(T)CY 
d[Y](m[S], 7) : T OK in m,a 


(Wr-DESTR 


VC EC,VD € D, 
C OK in m,a 
D OK in m,a 
P,m F lookup(C, D) OK in C, D 


all-names-distinct( D) 
m => (a, C, D, lookup) OK in P 


(WF-MATR) 


Ym € dom(P),m++ P(m) OK in P 
all-names-distinct(ctors(P)) 


P OK 


(WF-PROG) 


Fig. 5. Typing and well-formedness 
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to prevent accidental name clashes (checked by all-distinct). We compute the 
most general unifier of the two matrix types in the constructor and destruc- 
tor, respectively, to combine the type knowledge about the matrix type from 
the constructor and destructor type. If no such unifier exists, the equation is 
vacuously well-formed because the particular combination of constructor and 
destructor can never occur during execution of well-typed terms (WF-INFSBLE). 
Otherwise, we use the unifier o and apply it to the given type annotations to 
type-check the term t. A unifier o is a mapping from type variables to types, 
but we also use the notation o(t) and o(T) to apply o to all occurrences of type 
variables inside a term t or a type T, respectively. 

Constructor and destructor signatures are well-formed if they apply the cor- 
rect number of type parameters to the matrix type and contain no free type vari- 
ables (WF-CONSTR and WF-DESTR). A matrix is type-checked by making sure 
that all constructor and destructor signatures are well-formed, that all equations 
are well-formed for every constructor/destructor combination, and that destruc- 
tor names are unique in the matrix (WF-MATR). To check uniqueness of names, 
we use all-names-distinct, which checks for a given list of signatures that all of 
their names are distinct. A program is well-formed if all of its matrices typecheck 
and the constructor signatures of the program (retrieved by ctors) are globally 
unique (WF-PROG). 


3.6 GADTs and GAcoDTs 


In the formalization so far, we have deliberately kept matrices abstract as a kind 
of abstract data type. Now we can bring in the harvest of our language design. 
GADTs and GAcoDTs are two different physical representations of matrices, 
see Fig.6. They both contain nested vectors of equations and differ only in the 
order of the indices. With GADTs, the column labels are constructors and the 
row labels functions and a row corresponds to a function defined by pattern 
matching, with one equation for each case of the GADT. With GAcoDTs, the 
column labels are destructors, the row labels are functions, and a row corresponds 
to a function defined by copattern matching, with one equation for each case of 


Meapr = (a, y € C,6 E€ D, {ep,c|D € 6,C € y}) 
MGAcoDT = (a, y E€ C,6 € D, {ec,p|C € y, D € 8}) 
mkmatrix : Meapr + Meacopt > M 

mkmatrix = — obvious; omitted 


refunctionalize : Mgapr > MGAcoDT 
refunctionalize = transpose 


defunctionalize : MaAcopr > Meapr 
defunctionalize = transpose 


Fig. 6. GADTs and GAcoDTs 
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the GAcoDT. Hence both defunctionalize and refunctionalize, which swap the 
respective organization of the matrix, are just matrix transposition. 


4 Properties of GADTT 


In this section, we prove type soundness for GADTT , the preservation of typing 
and operational semantics under de- and refunctionalization, and that our physi- 
cal matrix representations of GADTs and GAcoDTs are accurate with respect to 
extension. All of these properties have been formalized and proven in Coq, based 
upon our Coq formalization of the previous section’s formal syntax, semantics, 
and type system. 


4.1 Type Soundness 
We start with the usual progress and preservation theorems. 


Theorem 1 (Progress). If P is a well-formed program and t is a term with 
no free type variables and P,e t: T, then t is either a value v, or there exists 
aterm t such that PFt—>?t. 


The proof of this theorem is a simple induction proof using a standard canonical 
forms lemma [30]. 

Preservation is much harder to prove. Often, preservation is proved using a 
substitution lemma which states that the substitution of a (term) variable by a 
term of the same type does not change the type of terms containing that term 
variable [30]. In GADT7, this lemma looks as follows: 


Lemma 1 (Term Substitution). Ift is a list of terms with P,et t: T and 
t' is a list of terms with Pert’: T' and t is aterm with Pa: T,y:T’ Ft: T, 


then Poel te :=t,7:=t/] :T 


However, in E-FIRE we perform both a substitution of terms and of types, 
hence the term substitution lemma is not enough to prove preservation; we also 
need a type substitution lemma. 


Lemma 2 (Type Substitution). If P, H- t:T, then P, T|X := T] + t[X := 
T]: T|X :=T] 


The proof of this lemma requires various auxiliary lemmas about properties (such 
as associativity) of type substitution. Taken together, these two lemmas are the 
two main intermediate results to prove the desired preservation theorem. 


Theorem 2 (Preservation). If P is a well-formed program and t is a term 
with no free type variables and P,eH t:T and PH t— t, then Pett’: T. 
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4.2 Defunctionalization and Refunctionalization 


The preservation of typing and operational semantics by de/refunctionalization 
is a trivial consequence of the lemma below, which holds due to the fact that 
both de- and refunctionalization is merely matrix transposition, see Fig. 6, and 
that the embedding mkmatriz of the physical matrices into the abstract repre- 
sentation ignores the organization of the physical matrices. 


Lemma 3 (Matrix Transposition) 
Ym € Meapr, mkmatriz(m) = mkmatriz(refunctionalize(m)). 
Ym € Meacopt, mkmatriz(m) = mkmatriz(defunctionalize(m)). 


Corollary 1 (Preservation of typing and reduction). De/refunctionali- 
zation of a matriz does not change the well-typedness of a program or the oper- 
ational semantics of a term. 


4.3 Extensibility 


So far, we have seen that our chosen physical matrix representations are 
amenable to easy proofs of the preservation of properties under de- and refunc- 
tionalization. However, are they also indeed accurate representations of GADTs 
and GAcoDTs? GADTs and GAcoDTs are utilized due to their extensibility 
along the destructor or constructor dimension, respectively, so we want this to 
be reflected by our representations. 

We assume that matrices are represented as a traditional linear program by 
reading them row-by-row. Adding a new row is a non-invasive operation (adding 
to the program), whereas adding a column requires changes to the existing pro- 
gram. 

We want to be able to extend our matrix representations with a new row, 
respectively representing the addition of a new destructor or constructor, without 
breaking well-typedness as long as the newly added equations typecheck with 
respect to the complete new program, and uniqueness of destructor/constructor 
names is preserved (globally, in the constructor case)*. 

In order to formally state that this is indeed the case, we first formally capture 
extension of GADT and GAcoDT matrices with the following definitions. These 
already include the preservation of local uniqueness as a condition, i.e., the name 
of the newly added destructor or constructor must be fresh within the matrix. 


Definition 1 (GADT extension). Consider an m E€ Meapr with m = 
(a,7,5,{ep,c|D € 6,C € y}). For any D' € D,D' ¢ ô, and equations ep' c, 
for each C € y, we call (a, y, U {D’}, {ep c|D € 8U {D'},C € y}) a GADT 
extension of m with D' and {ep c|C € 7}. 


4 The counterpart to this property on the side of the operational semantics is that the 
reduction relation of the new program restricted to terms befitting the old program 
equals the reduction relation of the old program; this however we omitted as it holds 
trivially when uniqueness is preserved. 
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Definition 2 (GAcoDT extension). Consider an m E€ Meacopr with m = 
(a,7,6,fec,p|C € y,D € 6}). For any C' € C,C’ ¢ y, and equations ec p, 
for each D € 6, we call (a, y U {C"}, 6, {ec,p|C E€ yU{C’}, D € 6}) a GAcoDT 
extension of m with C” and {ec:,p|D € 6}. 


We now straightforwardly lift these definitions to programs: A program P” is 
a GA(co)DT extension (with some signature and equations) of another program 
P if their matrices are identical except for one matrix name, and the under- 
lying physical matrix (packed with mkmatrix) assigned to this name under P’ 
is GA(co)DT extension (with this signature and equations) of the underlying 
physical matrix assigned under P. 

Using this terminology we can now formally state and prove the extensibility 
of GADTs and GAcoDTs: 


Theorem 3 (Datatype Extensibility). If P is a well-formed program, and 
P' is a GADT extension of P with D' and equations {ep c|C € y}, for 
the constructor signatures y of the matrix to be extended, such that P’,m + 
epc OK in C.D’ for each C € y, then P’ is well-formed. 


Theorem 4 (Codatatype Extensibility). If P is a well-formed program, and 
P’ is a GAcoDT extension of P with C", where the name of C" is different from 
all constructor names in P, and equations {ec p|D € ô}, for the destructor 
signatures ô of the matrix to be extended, such that P’,m + ec:.p OK in C’,D 
for each D € 6, then P’ is well-formed. 


In other words, in both cases we can type-check each row of a matrix in isolation, 
and if we put those rows together the resulting matrix and program containing 
that matrix will be well-formed. The results justify the familiar physical repre- 
sentation of programs where the variants of a GADT are fixed but we can freely 
add new functions that pattern-match on that GADT (and correspondingly for 
GAcoDTs). 


5 Discussion 


In this section we discuss applications and limitations of our work, talk about 
directions for future work, and describe the Coq formalization of the definitions 
and proofs. 


5.1 Applications 


Language Design. The most obvious application of our approach is to guide 
programming language design, namely by designing its features in such a way 
that the correspondence by de/refunctionalization is preserved. We believe that 
we can find “gaps” in existing languages by checking whether the correspond- 
ing dual feature exists, or massaging the language feature in such a way that a 
clear dual exists. For instance, on the datatype and pattern matching side, many 


76 K. Ostermann and J. Jabs 


features exist that have no clear counterpart on the codata side yet, such as pat- 
tern matching on multiple arguments, non-linear pattern matching, or pattern 
guards [22]. Some vaguely dual features exist on the codata side understood as 
“objects”, e.g. in the form of multi dispatch (such as [10]) or predicate dispatch 
[21]. We believe that the relation between pattern matching on multiple argu- 
ments and multi dispatch is a particularly interesting direction for future work, 
since it would entail generalizing our two-dimensional matrices to matrices of 
arbitrary dimension. 

Arguably, codata is the essence of object-oriented programming [12]. In any 
case, we believe that our design can also help to design object-oriented lan- 
guage features. For instance, there has been previous works on “object-oriented” 
GADTs [20,26] using extensions of generic types with certain classes of con- 
straints. For instance, in Kennedy and Russo’s [26] work, a list interface could 
be defined like this: 


interface List<A> { 
Integer size(); 
Integer sum() where A=Integer; // Kennedy & Russo’s syntax 


} 


If we compare this interface with the List codata type in Fig.1 (right hand 
side), then we can see that such constraints are readily supported by GAcoDTs; 
not because this feature was explicitly added but because it arises mechanically 
from dualizing GADTs. 

As another potential influence on language design, we believe that “closed- 
ness” under defunctionalization and refunctionalization can be a desirable lan- 
guage design quality that prevents oddities that things can be expressed better 
using codata than using data (or vice versa). For instance, Carette et al. [5] 
propose a program representation (basically again a form of Church encoding, 
hence a codata encoding) that works in a simple Haskell’98 language but whose 
datatype representation would require GADTs. This suggests a language design 
flaw in that the codata fragment of functions supports a more powerful type 
system than the data fragment of (non-generalized) algebraic data types. That 
is, the type arguments of a codata generator function’s result type may be arbi- 
trarily specialized, e.g., the result type might be List [Nat], while the type of a 
constructor must be fully generic, e.g., List [A]. Our approach gives a criterion 
on when the type systems for both sides are “in sync”. 


De/Refunctionalization as a Programmer Tool. Semantics-preserving program 
transformations are not only interesting on the meta-level of programming lan- 
guage design but also because they define an equivalence relation on programs. 
For instance, consider the program on the left-hand side of Fig. 7, written in our 
GAcoDT language. Nat is a representation of Church-encoded? natural num- 
bers as a GAcoDT with arity zero and a singular destructor fold with a type 


5 This form of typed Church encoding is sometimes called B6hm-Berarducci encoding 


[4]. 
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codata Func[2] where 
apply [A,B] (Func[A,B], A) : B 
data Nat[0] where 


codata Nat[0] where zero() : Nat 
fold[A] (Nat,A,Func[A,A]) : A succ(Nat) : Nat 
fun zero(): Nat where fun fold[A] (Nat,A,Func[A,A]) : A where 
fold[A] (zero(),z,s) = z fold[A] (zero(),z,s) =z 
fold[A] (succ(n) ,z,s) = 
fun succ(Nat): Nat where apply [A,A] (s, fold[A] (n,z,s)) 


fold[A] (succ(n),z,s) = 
apply[A,A] (s,fold[A] (n,z,s)) 


Fig. 7. Defunctionalizing Church-encoded numbers (left) yields Peano numbers with a 
fold function (right) 


parameter A. Defunctionalizing Nat yields the familiar Peano numbers with the 
standard fold function (right-hand side). 

Such equivalences have been identified as being useful to identify different 
forms of programs that are “the same elephant”. For instance, Olivier Danvy and 
associates [16,17] have used defunctionalization, refunctionalization, and some 
other transformations such as CPS-transformation to inter-derive “semantic arti- 
facts” such as big-step semantics, small-step semantics, and abstract machines 
(“The inter-derivations illustrated here witness a striking unity of computation, 
be this for reduction semantics, abstract machines, and normalization function: 
they all truly define the same elephant.” — Danvy et al. [15]). 

The applicability of these transformations is widened by our approach since 
we support arbitrary codata and not just functions. Exploring these new possi- 
bilities is an interesting area of future work. 

Furthermore, programmers can employ our transformation as a tool for a 
more practical purpose. Consider that at some point during the development of a 
large software, it might have been determined that the extensibility dimension for 
a particular aspect should be switched. That is, it is now thought that instead of 
allowing to add new variants (constructors), the software would be better poised 
by fixing the variants and allowing the addition of new operations (destructors), 
or vice versa. In the case that at this point it is further possible to make a 
closed-world assumption with regards to the particular type (represented as a 
matrix), since clients of the code are known and can be dealt with, it might 
seem reasonable to transpose the matrix representing that type. With GADT’, 
it is possible to do this independently of the other matrices in the program. (As 
already discussed, GADT™ in its present form doesn’t aim to be particularly 
developer-friendly, but we expect further language layers to be placed on top of 
GADT* to remedy this eventually.) 


Compiler Optimizations. To be able to use our automatizable transformation as 
a programmer tool, it was important to be able to make a closed-world assump- 
tion, where we have the entire program, or more precisely, the part which involves 


78 K. Ostermann and J. Jabs 


the matrix under consideration, at our disposal. A more automated process 
where such a kind of assumption can often be readily made is compilation. There, 
our matrix transposition transformation can be employed for a whole program 
optimization (such as [6]), as follows. An opportunity for optimization presents 
itself to the compiler when it is basically able to recognize an abstract machine 
in the code; optimizing this abstract machine is then an intermediate step, more 
generally applicable, that precedes hardware-specific optimizations [18]. As out- 
lined above, defunctionalization can turn higher-order programs into first-order 
programs where this machine might be apparent. With our pair of languages, 
using our readily automatizable defunctionalization (matrix transposition), it is 
possible to turn GAcoDT code into GADT code during the compilation phase. 
Then the compiler can leverage the potentially recognizable abstract machine 
form of the GADT code for its optimizations. 


5.2 Limitations 


As we said, our design rationale for GADTT was to clarify the relation between 
GADTs and GAcoDTs, not to provide a convenient language for developers. Here 
we discuss some ways to address the limitations resulting from that decision. 


Local (Co)Pattern Matching, Including A. A significant limitation of GADT™ is 
that (co)pattern matching is only allowed on the top-level; we don’t have “case” 
(or “match”) constructs on the term level. Any local (co)pattern matching, how- 
ever, can be converted to the top-level form by extracting it to a new top-level 
function definition. Variables free within the (co)pattern matching term must be 
passed to this function as arguments. In particular, anonymous local first-class 
functions, i.e., A expressions, are a form of local copattern matching which can 
be encoded in this way; this particular conversion is traditionally called lambda 
lifting. 


(Co)Pattern Matching on Zero or More Arguments. (Co)pattern matching in 
GADTT is only possible on a single, distinguished argument (in our presentation, 
the first, but this is not important). Nested and multiple-argument matching can 
be encoded by unnesting à la Setzer et al. [35], producing auxiliary functions. 

In GADT", it is further not possible to define a function without any (co) 
pattern matching entirely. The workaround of (co)pattern matching on a dummy 
argument of type Unit is simple, but it is not obvious how to reconcile this 
encoding with the symmetry of de/refunctionalization. 


Type Inference. We have deliberately avoided the question of type inference in 
this work. In general, we expect that the ample existing works on type inference 
for GADTs (such as Peyton Jones et al. [29], Schrijvers et al. [34], Chen and 
Erwig [7]) can be adapted to our setting and will also work for GAcoDTs. We 
see one complication, though: Due to the fact that destructors are only locally 
unique in GADTT, the (co)datatype the destructor belongs to must first be 
found via the type inferred for its distinguished, destructed argument. In other 
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words, we do not know which destructor signature to consider before we know 
the destructed argument’s type. This means that a type inference system which 
works inwards only, i.e., it discovers the types of the destructor arguments by 
looking at the signature, possibly leaving unification variables, and then checks 
that the recursively discovered types for the arguments conform, will not work. 


5.3 Termination and Productivity 


While termination and productivity are not in the focus of this paper, we want 
to mention that our unified treatment of data and codata can also lead to a 
unified treatment of termination and productivity. 

Here we want to illustrate informally that a simple syntactic criterion is 
sufficient to allow structural recursion and guarded corecursion. Syntactic ter- 
mination checks are not expressive enough for many situations, hence we leave a 
proper treatment of termination/productivity checking (such as with sized types 
[2]) for future work; the purpose of this discussion is merely to illustrate that 
termination checking could also benefit from unified data and codata and not to 
propose a practically useful termination checker. 

The basic idea is to restrict destructor calls in the right-hand sides of equa- 
tions to have the form d|T] (x,t) instead of d[T](t, t). That is to say, in destructor 
calls, we only allow variables from within the constructor pattern of the left-hand 
side. This criterion already guarantees termination (and hence also productiv- 
ity [2]) in our system, i.e. the finiteness of all reduction sequences, which can 
be shown with the usual argument of a property that strictly decreases under 
reduction. A reduction step in GADT* with right-hand sides restricted like that 
strictly decreases, under lexicographic order, the pair of 


1. the maximum of all the first (destructed) arguments depths in destructor calls 
of the term, and 

2. the sequence which counts how often each destructed argument depth appears 
in the term, starting with the maximum depth and going downward; those 
sequences are themselves lexicographically ordered. 


This strict decrease can be proved by induction on the derivation of the reduc- 
tion step. Since there are no infinitely decreasing sequences of these pairs, any 
reduction sequence must be finite. Note that our criterion in itself excludes far 
too many programs to be anywhere near practical, but it is readily conceivable 
how to relax it to only recursive calls together with a check that excludes mutual 
recursion.® 

Let’s look at Fig.7 once more to illustrate that this criterion corresponds 
to both structural recursion and guarded corecursion. In the right-hand side of 
Fig. 7 we see that the first argument to the recursive call in the last line is n, which 
is allowed by our restriction because it is a syntactic part of the original input, 


6 For instance one might request the programmer to order the destructor names such 
that in equations for a certain destructor only destructors of lower order may be 
called. 
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succ(n) (structural recursion). The call to apply is not a problem because it is 
not a recursive call.” At the same time, if we look at the last line in the left-hand 
side of Fig. 7, we see that the criterion also corresponds to guarded corecursion. 
With copatterns, guarded corecursion means that we do not destruct the result 
of a recursive call (the “guard” itself is implicit in the pattern on the left-hand 
side of the equation). However, destructing that result would mean that we would 
have to call a destructor with the recursive call as its first argument, which is 
again forbidden by the syntactic criterion. 


5.4 Going Beyond System F-like Polymorphism 


A particularly interesting direction for future work is to extend GADT™ and go 
beyond the System F-like polymorphism. For instance, F, contains a copy of 
the simply-typed lambda calculus on the type level. Could one also generalize 
type-level functions to arbitrary codata and maybe use a variant of GADTT 
on the type level? Can dependent products like in the calculus of constructions 
[13] be generalized in a similar way? Can inductive types like in the calculus 
of inductive constructions be formulated such that there is a dual that is also 
related by de/refunctionalization? Thibodeau et al. [36] have formulated such a 
dual, but whether it can be massaged to fit into the setting described here is not 
obvious. 


5.5 Coq Formalization 


Our Coq formalization is quite close to the traditional presentation chosen for 
this paper, but there are some technical differences. Both term and type variables 
are encoded via de Bruijn indices, which is rather standard for programming 
language mechanization. More interestingly, the syntax of the language in the 
Coq formalization expresses some of the constraints we express here via typing 
rules instead via dependent types. Specifically, terms and types are indexed 
by the type variables that can appear inside. To represent matrices, we have 
developed a small library of dependently typed tables (where the cell types 
can depend on the row and column labels), such that the matrix type already 
guarantees that all type variables that show up in terms and types are bound. 
An earlier version of the formalization and the soundness proof used explicit 
well-formedness constraints to guarantee that all type variables are bound; the 
type soundness proof for this version was about twice as long as the one using 
dependent types. On the flip side, we had to “pay” for using the dependent types 
in the form of many annoying “type casts” in definitions and theorems owing 
to the fact that Coq’s equality is intensional and not extensional [9, Sect. 10.3]. 
Finally, instead of using an evaluation context to define evaluation order like 
we did in Fig.4, we have used traditional congruence rules. In the reduction 
relation as formalized in Coq, a single step can actually correspond to multiple 
steps in the formalization presented in the paper; however, this is just a minor 
technicality to slightly simplify the proofs. 


T As long as we avoid mutual recursion, for instance by ensuring fold > apply. 
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6 Related Work 


“Theoreticians appreciate duality because it reveals deep symmetries. Practi- 
tioners appreciate duality because it offers two-for-the-price-of-one economy.” 
This quote from Wadler [38] describes the spirit behind the design of GADT’, 
but of course this is not the first paper to talk about duality in programming 
languages. We have already discussed the most closely related works in previous 
sections; here, we compare GADT™ with theoretical calculi with related dual- 
ity properties and point out an aspect of practical programming for which the 
duality of GADTT is relevant. 


Codata. Hagino [23] pioneered the idea of dualizing data types: Whereas data 
types are used to define a type by the ways to construct it, codatatypes are dual 
to them in the sense that they are specified by their deconstructions. Abel et al. 
[1] introduce copatterns which allow functions producing codata to be defined 
by matching on the destructors of the result codatatype, dually to matching on 
the constructors of the argument datatype. All these developments occur in a 
world where function types are a given. The symmetric codata and data lan- 
guage fragments proposed by Rendel et al. [31] deviate from this: By enhancing 
destructor signatures with argument types, they provide a form of codata that 
is a generalization of first-class functions. Both the works by Rendel et al. [31] 
and Abel et al. [1] are simply-typed. 

The (co)datatypes in the calculus of ownen and Ariola [19] also allow for 
user-defined function types. Their focus is different from ours, though, as they 
are mostly interested in evaluation strategies and their duality, and with regards 
to their calculus itself they work in an untyped setting. What is interesting in 
comparison with GADTT is how their (co)datatype declarations and signatures 
are inherently more symmetric as they essentially describe a type system for the 
parametric sequent calculus. As such, the position of additional arguments in 
the destructor signatures has a mirror counterparts in constructor signatures (to 
highlight this, Downen and Ariola [19] refer to destructors as “co-constructors” ). 


Duality of Computations and Values. Staying on with the idea of avoiding func- 
tion types as primitives for a moment, Wadler [38] presents a “dual calculus” in 
which the previously astonishing result that call-by-name is De Morgan-dual to 
call-by-value [14] is clarified by defining implication (corresponding to function 
types via the Curry-Howard isomorphism) in two different ways dependent on 
the intended corresponding evaluation regime. A somewhat similar approach, but 
perhaps more directly related to the data/codata duality, that also deals with 
the “troubling” coexistence of call-by-value and call-by-name, was proposed by 
Levy [27]. Levy [27] presents a calculus with a new evaluation regime, call-by- 
push value (CBPV), which subsumes call-by-value and call-by-name by encoding 
the local choice for either in the terms of the calculus. More specifically, there 
are two kinds of terms in the CBPV calculus: computations and values, which 
can be inter-converted by “thunking” and “forcing”. The terms for computations 
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and values are said to be of positive type and of negative type, respectively. Thi- 
bodeau et al. [36] have built their calculus, which extends codatatypes to indexed 
codatatypes, on top of CBPV, with datatypes being positive and codatatypes 
being negative. We think that, when extending GADT™ with local (co)pattern 
matching on the term level, perhaps with pattern and copattern matching terms 
mixed, it might be helpful to similarly recast the resulting language as a modi- 
fication of the CBPV calculus of Levy [27]. 


7 Conclusions 


We have presented a formal calculus, GADT™, which uniformly describes both 
GADTs and their dual, GAcoDTs. GADTs and GAcoDTs can be converted 
back and forth by defunctionalization and refunctionalization, both of which 
correspond to a transposition of the matrix of the equations for each pair of con- 
structor/destructor. We have formalized the calculus in Coq and mechanically 
verified its type soundness, its extensibility properties, and the preservation of 
typing and operational semantics by defunctionalization and refunctionalization. 

We believe that our work can be of help for future language design since it 
describes a methodology to get a kind of “sweet spot” where data and codata 
constructs (including functions) are “in sync”. We think that it can also be useful 
as a general program transformation tool, both on the program level as a kind 
of refactoring tool, but also as part of compilers and runtime systems. Finally, 
since codata is quite related to objects in object-oriented programming, we hope 
that our approach can help to clarify their relation and design languages which 
subsume both traditional functional and object-oriented languages. 


Acknowledgments. We would like to thank Tillmann Rendel and Julia Trieflinger 
for providing some early ideas for the design of what eventually became GADT”. This 
work was supported by DFG project OS 293/3-1. 
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Abstract. Synchronous Programming (SP) is a universal computa- 
tional principle that provides deterministic concurrency. The same input 
sequence with the same timing always results in the same externally 
observable output sequence, even if the internal behaviour generates 
uncertainty in the scheduling of concurrent memory accesses. Conse- 
quently, SP languages have always been strongly founded on mathe- 
matical semantics that support formal program analysis. So far, how- 
ever, communication has been constrained to a set of primitive clock- 
synchronised shared memory (CSM) data types, such as data-flow reg- 
isters, streams and signals with restricted read and write accesses that 
limit modularity and behavioural abstractions. 

This paper proposes an extension to the SP theory which retains the 
advantages of deterministic concurrency, but allows communication to 
occur at higher levels of abstraction than currently supported by SP data 
types. Our approach is as follows. To avoid data races, each CSM type 
publishes a policy interface for specifying the admissibility and prece- 
dence of its access methods. Each instance of the CSM type has to be 
policy-coherent, meaning it must behave deterministically under its own 
policy—a natural requirement if the goal is to build deterministic sys- 
tems that use these types. In a policy-constructive system, all access 
methods can be scheduled in a policy-conformant way for all the types 
without deadlocking. In this paper, we show that a policy-constructive 
program exhibits deterministic concurrency in the sense that all policy- 
conformant interleavings produce the same input-output behaviour. Poli- 
cies are conservative and support the CSM types existing in current SP 
languages. Technically, we introduce a kernel SP language that uses arbi- 
trary policy-driven CSM types. A big-step fixed-point semantics for this 
language is developed for which we prove determinism and termination 
of constructive programs. 
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1 Introduction 


Concurrent programming is challenging. Arbitrary interleavings of concurrent 
threads lead to non-determinism with data races imposing significant integrity 
and consistency issues [1]. Moreover, in many application domains such as safety- 
critical systems, determinism is indeed a matter of life and death. In a medical- 
device software, for instance, the same input sequence from the sensors (with the 
same timing) must always result in the same output sequence for the actuators, 
even if the run-time software architecture regime is unpredictable. 

Synchronous programming (SP) delivers deterministic concurrency out of 
the box! which explains its success in the design, implementation and validation 
of embedded, reactive and safety-critical systems for avionics, automotive, energy 
and nuclear industries. Right now SP-generated code is flying on the Airbus 380 
in systems like flight control, cockpit display, flight warning, and anti-icing just 
to mention a few. The SP mathematical theory has been fundamental for imple- 
menting correct-by-construction program-derivation algorithms and establishing 
formal analysis, verification and testing techniques [2]. For SCADE?, the SP 
industrial modelling language and software development toolkit, the formal SP 
background has been a key aspect for its certification at the highest level A of 
the aerospace standard DO-178B/C. This SP rigour has also been important for 
obtaining certifications in railway and transportation (EN 50128), industry and 
energy (IEC 61508), automotive (TUV and ISO 26262) as well as for ensuring 
full compliance with the safety standards of nuclear instrumentation and control 
(IEC 60880) and medical systems (IEC 62304) [3]. 


Synchronous Programming in a Nutshell. At the top level, we can imagine an 
SP system as a black-box with inputs and outputs for interacting with its envi- 
ronment. There is a special input, called the clock, that determines when the 
communication between system and environment can occur. The clock gets an 
input stimulus from the environment at discrete times. At those moments we 
say that the clock ticks. When there is no tick, there is no possible commu- 
nication, as if system and environment were disconnected. At every tick, the 
system reacts by reading the current inputs and executing a step function that 
delivers outputs and changes the internal memory. For its part, the environment 
must synchronise with this reaction and do not go ahead with more ticks. Thus, 


1 Milner’s distinction between determinacy and determinism is that a computation 
is determinate if the same input sequence produces the same output sequence, as 
opposed to deterministic computations which in addition have identical internal 
behaviour/scheduling. In this paper we use both terms synonymously to mean deter- 
minacy in Milner’s sense, i.e., observable determinism. 

? SCADE is a product of ANSYS Inc. (http://www.esterel-technologies.com/). 
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in SP, we assume (Synchrony Hypothesis) that the time interval of a system 
reaction, also called macro-step or (synchronous) instant, appears instantaneous 
(has zero-delay) to the environment. Since each system reaction takes exactly 
one clock tick, we describe the evolution of the system-environment interaction 
as a synchronous (lock-step) sequence of macro-steps. The SP theory guarantees 
that all externally observable interaction sequences derived from the macro-step 
reactions define a functional input-output relation. 

The fact that the sequences of macro-steps take place in time and space 
(memory) has motivated two orthogonal developments of SP. The data-flow 
view regards input-output sequences as synchronous streams of data changing 
over time and studies the functional relationships between streams. Dually, the 
control-flow approach projects the information of the input-output sequences 
at each point in time and studies the changes of this global state as time pro- 
gresses, i.e., from one tick to the next. The SP paradigm includes languages 
such as Esterel [4], Quartz [5] and SC [6] in the imperative control-flow style 
and languages like Signal [7], Lustre [8] and Lucid Synchrone [9] that support 
the declarative data-flow view. There are even mixed control-data flow language 
such as Esterel V7 [10] or SCADE [3]. Independently of the execution model, the 
common strength to all of these SP languages is a precise formal semantics—an 
indispensable feature when dealing with the complexities of concurrency. 

At a more concrete level, we can visualise an SP system as a white-box where 
inside we find (graphical or textual) code. In the SP domain, the program must 
be divided into fragments corresponding to the macro-step reactions that will 
be executed instantaneously at each tick. Declarative languages usually organise 
these macro-steps by means of (internally generated) activation clocks that pre- 
scribe the blocks (nodes) that are performed at each tick. Instead, imperative 
textual languages provide a pause statement for explicitly delimiting code exe- 
cution within a synchronous instant. In either case, the Synchrony Hypothesis 
conveniently abstracts away all the, typically concurrent, low-level micro-steps 
needed to produce a system reaction. The SP theory explains how the micro-step 
accesses to shared memory must be controlled so as to ensure that all internal 
(white-box) behaviour eventually stabilises, completing a deterministic macro- 
step (black-box) response. For more details on SP, the reader is referred to [2]. 


State of the Art. Traditional imperative SP languages provide constructs to 
model control-dominated systems. Typically, these include a concurrent compo- 
sition of threads (sequential processes) that guarantees determinism and offers 
signals as the main means for data communication between threads. Signals 
behave like shared variables for which the concurrent accesses occurring within 
a macro-step are scheduled according to the following principles: A pure signal 
has a status that can be present (1) or absent (0). At the beginning of each 
macro-step, pure signals have status 0 by default. In any instant, a signal s 
can be explicitly emitted with the statement s.emit() which atomically sets 
its status to 1. We can read the status of s with the statement s.pres(), so 
the control-flow can branch depending on run-time signal statuses. Specifically, 
inside programs, if-then-else constructions await for the appropriate combination 
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of present and absent signal statuses to emit (or not) more signals. The main 
issue is to avoid inconsistencies due to circular causality resulting from decisions 
based on absent statuses. Thus, the order in which the access methods emit, 
pres are scheduled matters for the final result. The usual SP rule for ensur- 
ing determinism is that the pres test must wait until the final signal status is 
decided. If all signal accesses can be scheduled in this decide-then-read way then 
the program is constructive. All schedules that keep the decide-then-read order 
will produce the same input-output result. This is how SP reconciles concur- 
rency and observable determinism and generates much of its algebraic appeal. 
Constructiveness of programs is what static techniques like the must-can analy- 
sis [4,11-13] verify although in a more abstract manner. Pure signals are a simple 
form of clock-synchronised shared memory (CSM) data types with access meth- 
ods (operations) specific to this CSM type. Existing SP control-flow languages 
also support other restricted CSM types such valued signals and arrays [10] or 
sequentially constructive variables [6]. 


Contribution. This paper proposes an extension to the SP model which retains 
the advantages of deterministic concurrency while widening the notion of con- 
structiveness to cover more general CSM types. This allows shared-memory com- 
munication to occur at higher levels of abstraction than currently supported. In 
particular, our approach subsumes both the notions of Berry-constructiveness [4] 
for Esterel and sequential constructiveness for SCL [14]. This is the first time 
that these SP communication principles are combined side-by-side in a single 
language. Moreover, our theory permits other predefined communication struc- 
tures to coexist safely under the same uniform framework, such as data-flow 
variables [8], registers [15], Kahn channels [16], priority queues, arrays as well as 
other CSM types currently unsupported in SP. 


Synopsis and Overview. The core of our approach is presented in Sect. 2 where 
policies are introduced as a (constructive) synchronisation mechanism for arbi- 
trary abstract data types (ADT). For instance, the policy of a pure signal is 
depicted in Fig.1. It has two control states 0 and 1 corresponding to the two 
possible signal statuses. Transitions are decorated with method names pres, 
emit or with ø to indicate a clock tick. 

The policy tells us whether a given 
method or tick is admissible, i.e., ifit can * ~~!” sa ; 


be scheduled from a particular state’. De ing 

In addition, transitions include a block- D 

ing set of method names as part of ee teresa A emit: 
their action labels. This set determines otick®.¢ T O pres:() 


a precedence between methods from a 

given state. A label m : L specifies that Fig. 1. Pure signal policy. 

all methods in L take precedence over m. 

An empty blocking set Ø indicates no precedences. To improve visualisation, we 


3 The signal policy in Fig. 1 does not impose any admissibility restriction since meth- 
ods pres and emit can be scheduled from every policy state. 
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highlight precedences by dotted (red) arrows tagged prec*. The policy interface 
in Fig. 1 specifies the decide-then-read protocol of pure signals as follows. At 
any instant, if the signal status is 0 then the pres test can only be scheduled 
if there are no more potential emit statements that can still update the status 
to 1. This explains the precedence of the emit transition over the self loop with 
action label pres : {emit} from state 0. Conversely, transitions pres and emit 
from state 1 have no precedences, meaning that the pres and emit methods 
are confluent so they can be freely scheduled (interleaved). The reason is that a 
signal status 1 is already decided and can no longer be changed by either method 
in the same instant. In general, any two admissible methods that do not block 
each other must be confluent in the sense that the same policy state is reached 
independently of their order of execution. Note that all the ø transition go to the 
initial state 0 since at each tick the SP system enters a new macro-step where 
all pure signals get initialised to the 0 status. 

Section 2 describes in detail the idea of a scheduling policy on general CSM 
types. This leads to a type-level coherence property, which is a local form of 
determinism. Specifically, a CSM type is policy-coherent if it satisfies the (policy) 
specification of admissibility and precedence of its access methods. The point is 
that a policy-coherent CSM type per se behaves deterministically under its own 
policy—a very natural requirement if the goal is to build deterministic systems 
that use this type. For instance, the fact that Esterel signals are determinis- 
tic (policy-coherent) in the first place permits techniques such as the must-can 
analysis to get constructive information about deterministic programs. We show 
how policy-coherence implies a global determinacy property called commutation. 
Now, in a policy-constructive program all access methods can be scheduled in a 
policy-conforming way for all the CSM types without deadlocking. We also show 
that, for policy-coherent types, a policy-constructive program exhibits determin- 
istic concurrency in the sense that all policy-conforming interleavings produce 
the same input-output behaviour. 

To implement a constructive scheduling mechanism parameterised in arbi- 
trary CSM type policies, we present the synchronous kernel language, called 
Deterministic Concurrent Language (DCoL), in Sect. 2.1. DCoL is both a min- 
imal language to study the new mathematical concepts but can also act as an 
intermediate language for compiling existing SP Sect. 3 presents its policy-driven 
operational semantics for which determinacy and termination is proven. Section 3 
also explains how this model generalises existing notions of constructiveness. We 
discuss related work in Sect. 4 and present our conclusions in Sect. 5. 

A companion of this paper is the research report (https://www.uni-bamberg. 
de/fileadmin/uni/fakultaeten /wiai_professuren/grundlagen_informatik/papers 
MM/report-WIAI-102-Feb-2018.pdf) [17] which contains detailed proofs and 
additional examples of CSM types. 


4 We tacitly assume that the tick transitions o have the lowest priority since only 
when the reaction is over, the clock may tick. We could be more explicit and write 
o : {pres, emit} as action labels for these transitions. 
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2 Synchronous Policies 


This section introduces a kernel synchronous Deterministic Concurrent Lan- 
guage (DCoL) for policy-conformant constructive scheduling which integrates 
policy-controlled CSM types within a simple syntax. DCoL is used to discuss 
the behavioural (clock) abstraction limitations of current SP. Then policies are 
introduced as a mechanism for specifying the scheduling discipline for CSM types 
which, in this form, can encapsulate arbitrary ADTs. 


2.1 Syntax 


The syntax of DCoL is given by the following operators: 


P ::= skip instantaneous termination 
pause wait for next instant (clock tick) 
PIIP parallel composition 
PP sequential composition 
letx=c.m(e)inP access method call, x value variable 
if e then P else P conditional branching, e value expression 
recp. P recursive closure 
p process variable 


The first two statements correspond to the two forms of immediate comple- 
tion: skip terminates instantaneously and pause waits for the logical clock to 
terminate. The operators P || Q and P ; Q are parallel interleaving and imper- 
ative sequential composition of threads with the standard operational interpre- 
tation. Reading and destructive updating is performed through the execution of 
method calls c.m(e) on a CSM variable c € O with a method m € Me. The sets 
O and M, define the granularity of the available memory accesses. The construct 
let e=c.m(e) inP calls m on c with an input parameter determined by value 
expression e. It binds the return value to variable x and then executes program 
P, which may depend on z, sequentially afterwards. The execution of c.m/(e) in 
general has the side-effect of changing the internal memory of c. In contrast, the 
evaluation of expression e is side-effect free. For convenience we write x =c.m(e);P 
for let x =c.m(e) in P. When P does not depend on x then we write c.m(e) ; P and 
c.m(e); for c.m(e); skip. The exact syntax of value expressions e is irrelevant for 
this work and left open. It could be assimple as permitting only constant value liter- 
als or a full-fledged functional language. The conditional if e then P else P has 
the usual interpretation. For simplicity, we may write if c.m(e) then P else Q 
to mean x=c.m(e); if x then P else Q. The recursive closure rec p. P binds the 
behaviour P to the program label p so it can be called from within P. Using 
this construct we can build iterative behaviours. For instance, loop Pend =gf 
rec p. P; pause ;p indefinitely repeats P in each tick. We assume that in a closure 
rec p. P the label p is (i) clock guarded, i.e., it occurs in the scope of at least one 
pause (meaning no instantaneous loops) and (ii) all occurrences of pare in the same 
thread. Thus, rec p. pis illegal because of (i) and rec p. (pause ; p || pause ; p) is not 
permitted because of (ii). 
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This syntax seems minimalistic compared to existing SP languages. For 
instance, it does not provide primitives for pre-emption, suspension or traps as 
in Quartz or Esterel. Recent work [18] has shown how these control primitives 
can be translated into the constructs of the SCL language, exploiting destruc- 
tive update of sequentially constructive (SC) variables. Since SC variables are 
a special case of policy-controlled CSM variables, DCoL is at least as expressive 
as SCL. 


2.2 Limited Abstraction in SP 


The pertinent feature of standard SP languages is that they do not permit the 
programmer to express sequential execution order inside a tick, for destructive 
updates of signals. All such updates are considered concurrent and thus must 
either be combined or concern distinct signals. For instance, in languages such 
as Esterel V7 or Quartz, a parallel composition 


(v=xs.read() ; ys.emit(v + 1)) ll (xs.emit(1) ; xs. emit(5)) (1) 


of signal emissions is only constructive if a commutative and associative function 
is defined on the shared signal xs to combine the values assigned to it. But then, 
by the properties of this combination function, we get the same behaviour if we 
swap the assignments of values 1 and 5, or execute all in parallel as in 


v=xs.read() || ys.emit(v+1) || xs.emit(1) I| xs.emit(5). 


If what we intended with the second emission xs. emit(5) in (1) was to override 
the first xs. emit(1) like in normal imperative programming so that the concur- 
rent thread v=xs.read() ; ys. emit(v + 1) will read the updated value as v = 5? 
Then we need to introduce a pause statement to separate the emissions by a 
clock tick and delay the assignment to ys as in 


(pause ; v=xs.read() ; ys. emit(v + 1)) ll (xs.emit(1) ; pause ; xs. emit(5)). 


This makes behavioural abstraction difficult. For instance, suppose nats is a syn- 
chronous reaction module, possibly composite and with its own internal clock- 
ing, which returns the stream of natural numbers. Every time its step func- 
tion nats.step() is called it returns the next number and increments its inter- 
nal state. If we want to pair up two successive numbers within one tick of an 
outer clock and emit them in a single signal ys we would write something like 
zı = nats.step() ; 72 = nats.step() ; y. emit(x1, 22) where z1, £2 are thread- 
local value variables. This over-clocking is impossible in traditional SP because 
there is no imperative sequential composition by virtue of which we can call the 
step function of the same module instance twice within a tick. Instead, the two 
calls nats.step() are considered concurrent and thus create non-determinacy in 
the value of y.° To avoid a compiler error we must separate the calls by a clock as 


5 In Esterel V7 it is possible to use a module twice in a “sequential” composition xı = 
nats.step();72 = nats.step(). However, the two occurrences of nats are distinct 
instances with their own internal state. Both calls will thus return the same value. 
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in zı = nats.step() ; pause ; £2 = nats.step() ; y.emit(x1,22) which breaks 
the intended clock abstraction. 

The data abstraction limitation of traditional SP is that it is not directly pos- 
sible to encapsulate a composite behaviour on synchronised signals as a shared 
synchronised object. For this, the simple decide-then-read signal protocol must 
be generalised, in particular, to distinguish between concurrent and sequential 
accesses to the shared data structure. A concurrent access zı =nats.step() ll 
v2=nats.step() must give the same value for zı and zə, while a sequential 
access zı =nats.step() ; v2=nats.step() must yield successive values of the 
stream. In a sequence «=xs.read() ; xs.emit(v) the x does not see the value 
v but in a parallel x=xs.read() || xs. emit(v) we may want the read to wait for 
the emission. The rest of this section covers our theory on policies in which this 
is possible. The modularity issue is reconsidered in Sect. 2.6. 


2.3 Concurrent Access Policies 


In the white-box view of SP, an imperative program consists of a set of threads 
(sequential processes) and some CSM variables for communication. Due to con- 
currency, a given thread under control (TUC) has the chance to access the shared 
variables only from time to time. For a given CSM variable, a concurrent access 
policy (CAP) is the locking mechanism used to control the accesses of the cur- 
rent TUC and its environment. The locking is to ensure that determinacy of the 
CSM type is not broken by the concurrent accesses. A CAP is like a policy which 
has extra transitions to model potential environment accesses outside the TUC. 
Concretely, a CAP is given by a state machine where each transition label a: L 
codifies an action a taking place on the shared variable with blocking set L, 
where L is a set of methods that take precedence over a. The action is either 
a method m : L, a silent action T : L or a clock tick ø : L. A transition m : L 
expresses that in the current CAP control state, the method m can be called by 
the TUC, provided that no method in L is called concurrently. There is a Deter- 
minacy Requirement that guarantees that each method call by the TUC has a 
blocking set and successor state. Additionally, the execution of methods by the 
CAP must be confluent in the sense that if two methods are admissible and do 
not block each other, then the CAP reaches the same policy state no matter the 
order in which they are executed. This is to preserve determinism for concur- 
rent variable accesses. A transition 7 : L internalises method calls by the TUC’s 
concurrent environment which are uncontrollable for the TUC. In the sequel, the 
actions in M, U {oc} will be called observable. A transition ø : L models a clock 
synchronisation step of the Tuc. Like method calls, such clock ticks must be 
determinate as stated by the Determinacy Requirement. Additionally, the clock 
must always wait for any predicted concurrent T-activity to complete. This is the 
Maximal Progress Requirement. Note that we do not need confluence for clock 
transitions since they are not concurrent. 


Definition 1. A concurrent access policy (CAP) IF, of a CSM variable c with 
(access) methods M. is a state machine consisting of a set of control states P., 
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an initial state £ € P. and a labelled transition relation — C P. x A, x Pe with 
action labels Ae = (M U {7,a}) x 2“. Instead of (m, (a, L), u2) € —> we write 
by —a:L— uz. We then say action a is admissible in state uı and blocked by 
all methods m € L C M.. When the blocking set L is irrelevant we drop it and 
write pı —a— u2. A CAP must satisfy the following conditions: 


— Determinacy. If u —a:Lı—> py and u —a:Lə—> u2 then Lı = Lə and py = u2 
provided a is observable, i.e., a ÆT. 

- Confluence. If u —mı:Lı—> pı and p —m2:L2— u2 do not block each other, 
ie., mı E€ M,\ Lo and mz E€ M.\ La, then for some w both pı —m2—> p 
and p -mi> p. 

— Maximal Progress. u —a:Lı—> uı and u —o:L2— u2 imply a is observable. 


A policy is a CAP without any (concurrent) T activity, i.e., every u —a—> p 
implies that a is observable. 


The use of a CAP as a concurrent policy arises from the notion of enabling. 
Informally, an observable action a € M, U {ø} is enabled in a state u of a CAP 
if it is admissible in u and in all subsequent states reachable under arbitrary 
silent steps. To formalise this we define weak transitions 4, = uz inductively 
to express that either wy = u2 or pı => p’ and u’ —T—> uo. We exploit the 
determinacy for observable actions a E€ M, U {øo} and write «© a for the unique 
u’ such that u —a— p’, if it exists. 


Definition 2. Given a CAP |-.= (P.,€, —>), an observable action a E€ M,U{a} 
is enabled in state u € P., written wl. La, if uw’ ©a exists for all p such that 
L= H. A sequence a E€ (M.U {o})* of observable actions is enabled in u € Pe, 
written u lF. | a, if (i) a=e or (ii) a=ab, piF. la andpOalk. |b. 


Example 1. Consider the policy IF, in Fig.1 of an 


Esterel pure signal s. An edge labelled a:L from state 7 tick z ae 
11 to u2 corresponds to a transition yı —a:L— p2 FO} ave Ott (1) 
in IFs. The start state is € = 0 and the methods — A&A 


Ms = {pres, emit} are always admissible, i.e., yO m put:{put} 
is defined in each state u for all methods m. The pres- 
ence test does not change the state and any emission 
sets it to 1, i.e., y © pres = u and u © emit = 1 for 
all u € Ps. Each signal status is reset to 0 with the clock tick, i.e., wOo = 0. 
Clearly, I satisfies Determinacy. A presence test on a signal that is not emitted 
yet has to wait for all pending concurrent emissions, that is emit blocks pres 
in state 0, i.e., 0 —pres:{emit}— 0. Otherwise, no transition is blocked. Also, 
all competing transitions u —m ,:L,;— py, and pp —mMz:Lə—> pg that do not block 
each other, are of the form uı = u2, from which Confluence follows. Note that 
since there are no silent transitions, Maximal Progress is always fulfilled too. 
Moreover, an action sequence is enabled in a state u (Definition 2) iff it corre- 
sponds to a path in the automaton starting from u. Hence, for m € Mz we have 


Fig. 2. Synchronous IVar. 
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0 Its | m iff m is in the regular language? pres* + pres* emit(pres + emit)* 
and 1 It, | m for all m € MŽ. 

Contrast I+, with the policy IF. of a synchronous immutable variable (IVar) c 
shown in Fig. 2 with methods M, = {get, put}. During each instant an IVar can 
be written (put) at most once and cannot be read (get) until it has been written. 
No value is stored between ticks, which means the memory is only temporary and 
can be reused, e. g., [Vars can be implemented by wires. Formally, u IF. | put iff 
u = 0, where 0 is the initial empty state and u lF. | get iff u = 1, where 1 is the 
filled state. The transition 0 —put:{put}— 1 switches to filled state where get 
is admissible but put is not, anymore. The blocking {put} means there cannot 
be other concurrent threads writing c at the same time. 


2.4 Enabling and Policy Conformance 


A policy describes what a single thread can do to a CSM variable c when it 
operates alone. In a CAP all potential activities of the environment are added as T- 
transitions to block the TUC’s accesses. To implement this 7-locking we define an 
operation that generates a CAP [j1, y] out of a policy. In this construction, u € Pe 
is a policy state recording the history of methods that have been performed on c 
so far (must information). The second component y C Mž is a prediction for the 
sequences of methods that can still potentially be executed by the concurrent 
environment (can information). 


Definition 3. Let (P.,€,—) be a policy. We define a CAP IF, where states are 
pairs |u, y] such that u € P. is a policy state and y C M% is a prediction. The 
initial state is |e, M4] and the transitions are as follows: 


1. The observable transitions |ui, yı] —m:L— [2,72] are such that yı = 72 and 
Lı —m:L— uz provided that for all sequences nn € yı with pı —n— py’ we 
have n ¢ L. 

2. The silent transitions are |u, yı] —T:L— [u2, %2] such that 9 A my C yı 
and pı —m:L— H2. 

3. The clock transitions are |m, yı] —o:L— [u2, %2] such that y = Ó and 
pı —0:L—> pe. 


Silent steps arise from the concurrent environment: A step [u, 71] —T:L—> 
[2,72] removes some prefix method m from the environment prediction 71, 
which contracts to an updated suffix prediction yg with m y2 C y1. This method 
m is executed on the CSM variable, changing the policy state to u2 = p1 © m. A 
method m is enabled, [u, 7] I-e | m, if for all [u1, y1] which are 7-reachable from 
|u, y], method m is admissible, i.e., [1,71] —m— [u2, 71] for some po. 


Example 2. Consider concurrent threads P, || P2, where P> = zs.put(5); u= 
ys.get() and P) = v=zs.get() ; ys.put(v + 1) with IVars zs, ys according to 


6 We are more liberal than Esterel where emit cannot be called sequentially after pres. 
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Example 1. Under the IVar policy the execution is deterministic, so that first 
P> writes on zs, then Pı reads from zs and writes to ys, whereupon finally P, 
reads ys. Suppose the variables have reached policy states uz, and ply, and the 
threads are ready to execute the residual programs F’ waiting at some method 
call c;.m,(v;), respectively. Since thread P’ is concurrent with the other P3_,, it 
can only proceed if m; is not blocked by P$_,, i.e., if [Uc,, Cane; (P3_;)] lke; | mi, 
where can.(P) C MŽ is the set of method sequences predicted for P on c. 

Initially we have pizs = 0 = Hys. Since method get is not admissible in state 
0, we get [0, canzs(P2)] es | get by Definitions 3 and 2. So, P, is blocked. The 
zs.put of P2, however, can proceed. First, since no predicted method sequence 
CaNzs(P,) = {get} of Pı starts with put, the transition 0 —put:{put}— 1 
implies that [0, canzs(P:)] —put:{put}— [1, canzs(P1)] by Definition 3(1). More- 
over, since get of Pı is not admissible in 0, there are no silent transitions out 
of [0, canzs(P1)] according to Definition 3(2). Thus, [0, canzs(P1)] I-zs | put, as 
claimed. 

When the zs.put is executed by P> it turns into P} = u=ys.get() and 
the policy state for zs advances to ui, = 1, while ys is still at py; = 0. Now 
ys.get of P, blocks for the same reason as zs was blocked in P, before. But 
since P has advanced, its prediction on zs reduces to canzg(P3) = Ø. There- 
fore, the transition 1 —get:@— 1 implies [1, canzs(P})] —get:0— [1, canzs(P3)| 
by Definition 3(1). Also, there are no silent transitions out of [1, canzs(P3)| by 
Definition 3(2) and so [u}s, canzs(P3)] |-zs | get by Definition 2. This permits 
P, to execute zs.get() and proceed to P/ = ys.put(5+ 1). The policy state of 
zs is not changed by this, neither is the state of ys, whence P; is still blocked. 
Yet, we have [Hys, canzs(P3)] IFys | put which lets P] complete ys.put. It reaches 
Pi’ with canys(P{’) = Ø and changes the policy state of ys to uy, = 1. At this 
point, [H5,, Canzs(Pi’)] Irys | get which means P} unblocks to execute ys.get. 


Definition 4. Let |. be a policy for c. A method sequence m; blocks another 
mo in state u, written u lF my > m, if ulFe | me but |u, {m} Ke | me. Two 
method sequences mı and mz are concurrently enabled, denoted pl, Mı © M2 
if u lF | mi, wlFe | Mm and both wk. Mm > m and pKa Mm > mı. 


Our operational semantics will only let a TUC execute a sequence m provided 
ju, y] Fe | m, where u is the current policy state of c and y the predicted 
activity in the TUC’s concurrent environment. Symmetrically, the environment 
will execute any n € y only if it is enabled with respect to m, i.e., if [u, {m }] IF 
| n. This means p lk, m © n. Policy coherence (Definition 5 below) then implies 
that every interleaving of the sequences m and any n € y leads to the same 
return values and final variable state (Proposition 1). 


2.5 Coherence and Determinacy 


A method call m(v) combines a method m € M. with a method parameter’ 
v € D, where D is a universal domain for method arguments and return values, 


T This is without loss of generality since D may contain value tuples. 
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including the special don’t care value _ € D. We denote by A, = {m(v) | m € 
M.,u € D} the set of all method calls on object c. Sequences of method calls 
a € AŽ can be abstracted back into sequences of methods a#* € Mž* by dropping 
the method parameters: ¢* = e and (m(v) a)*# = ma*. 

Coherence concerns the semantics of method calls as state transformations. 
Let S. be the domain of memory states of the object c with initial state inite € 
S.. Each method call m(v) € A. corresponds to a semantical action [m(v)]- € 
S. — (D x S,). If s E€ S. is the current state of the object then executing a call 
m(v) on c returns a pair (u, s’) = [m(v)]e(s) where the first projection u € D 
is the return value from the call and the second projection s’ € Se is the new 
updated state of the variable. For convenience, we will denote u = 71[m(v)]<(s) 
by u = s.m(v) and s’ = m2[m(v)]-(s) by s’ = s © m(v). The action notation 
is extended to sequences of calls a € Až in the natural way: s©€ = s and 
8 © (m(v) a) = (s © m(v)) Oa. 

For policy-based scheduling we assume an abstraction function mapping a 
memory state s € S, into a policy state s* € P.. Specifically, init = e. Further, 
we assume the abstraction commutes with method execution in the sense that 
if we execute a sequence of calls and then abstract the final state, we get the 
same as if we executed the policy automaton on the abstracted state in the first 
place. Formally, (s © a)# = s# © a# for all s € S. and a € Až. 


Definition 5 (Coherence). A CSM variable c is policy-coherent if for all 
method calls a,b € Ae whenever s# |F. at o b# for a state s € Se, then 
a and b are confluent in the sense that s.a = (s © b).a, s.b = (s © a).b and 
sO©OaOb=s0b0a. 


Example 3. Esterel pure signals do not carry any data value, so their memory 
state coincides with the policy state, Ss = P, = {0,1} and s# = s. An emission 
emit does not return any value but sets the state of s to 1, i.e., s.emit(_) = 
- € D and s©emit(_) = 1 € Ss. A present test returns the state, s.pres(_) = s, 
but does not modify it, s©pres(_) = s. From the policy Fig. 1 we find that the 
concurrent enablings s* It, a” © b# according to Definition 4 are (i) a = b € 
{pres(_),emit(_)} for arbitrary s, or (ii) s = 1, a = emit(_) and b = pres(_). 
In each of these cases we verify s.a = (s © b).a, s.b = (s © a).b and sOaOb= 
s©b6©a without difficulty. Note that 1 lk; emit o pres since the order of 
execution is irrelevant if s = 1. On the other hand, 0 IK, emit © pres because 
in state 0 both methods are not confluent. Specifically, 0.pres(_) = 0 4 1 = 
(0 © emit(_)). pres(_). 


A special case are linear precedence policies where u lF. | m for all m € Me 
and u lke m —> n is a linear ordering on Mg, for all policy states u. Then, for no 
state we have p lk, Mı © m2, so there is no concurrency and thus no confluence 
requirement to satisfy at all. Coherence of c is trivially satisfied whatever the 
semantics of method calls. For any two admissible methods one takes precedence 
over the other and thus the enabling relation becomes deterministic. There is 
however a risk of deadlock which can be excluded if we assume that threads 
always call methods in order of decreasing precedence. 
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The other extreme case is where the policy makes all methods concurrently 
enabled, i.e., y Ike Mı © mg for all policy states u and methods m1, mg. This 
avoids deadlock completely and gives maximal concurrency but imposes the 
strongest confluence condition, viz. independently of the scheduling order of 
any two methods, the resulting variable state must be the same. This requires 
complete isolation of the effects of any two methods. Such an extreme is used, 
e. g., in the CR library [19]. The typical CSM variable, however, will strike a trade- 
off between these two extremes. It will impose a sensible set of precedences that 
are strong enough to ensure coherent implementations and thus determinacy for 
policy-conformant scheduling, while at the same time being sufficiently relaxed to 
permit concurrent implementations and avoiding unnecessary deadlocks risking 
that programs are rejected by the compiler as un-scheduleable. 

Whatever the policies, if the variables are coherent, then all policy- 
conformant interleavings are indistinguishable for each CSM variable. To state 
schedule invariance in its general form we lift method actions and independence 
to multi-variable sequences of methods calls A = {c.m(v) | c € O,m(v) € Ac}. 
For a given sequence a € A“ let me(a) € Az be the projection of a on c, formally 
T(E) = £, Te(e.m(v) a) = m(v) re(a) and m,.(c’.m(v) a) = tela) for d Æ c. 
A global memory X € S = [Į] co S. assigns a local memory X.c € S. to each 
variable c. We write init for the initial memory that has init.c = inite and 
(init.c)# =e € Py. 

Given a global memory X € S and sequences a, 3 € A“ of method calls, we 
extend the independence relation of Definition 4 variable-wise, defining X IF a o 
B iff (.c)* IF, (me(a))* © (7¢(8))*. The application of a method call a € A to 
a memory X € S is written X.a € S and defined (X.(c.m(v))).c = (X.c).m(v) 
and (X.(c.m(v))).c" = Lic! for c’ # c. Analogously, method actions are lifted 
to global memories, i.e., (X © c.m(v)).c’ = X.c' if c' # c and (X ©c.m(v)).c = 
y.c © mv). 


Proposition 1 (Commutation). Let all CSM variables be policy-coherent and 
X lF aoa for a memory X € S, method call a € V and sequences of method 
calls a € V“. Then, POaQa=LVOaGa and V.a=(LOa).a. 


2.6 Policies and Modularity 


Consider the synchronous data-flow network cnt in Fig.3b with three process 
nodes, a multiplexer mux, a register reg and an incrementor inc. Their DCoL 
code is given in Fig. 3a. The network implements a settable counter, which pro- 
duces at its output ys a stream of consecutive integers, incremented with each 
clock tick. The wires ys, zs and ws are IVars (see Example 2) carrying a single 
integer value per tick. The input xs is a pure Esterel signal (see Example 1). 
The counter state is stored by reg in a local variable xv with read and write 
methods that can be called by a single thread only. The register is initialised 
to value 0 and in each subsequent tick the value at ys is stored. The inc takes 
the value at zs and increments it. When the signal xs is absent, mux passes the 
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incremented value on ws to ys for the next tick. Otherwise, if xs is present then 
mux resets ys. 

The evaluation order is implemented by the policies of the [Vars ys, zs and 
ws. In each case the put method takes precedence over get which makes sure that 
the latter is blocked until the former has been executed. The causality cycle of 
the feedback loop is broken by the fact that the reg node first sends the current 
counter value to zs before it waits for the new value at ys. The other nodes mux 
and inc, in contrast, first read their inputs and then send to their output. 


module cnt 
[ % mux node 
loop 
v = xs.pres(); 
if v then ys.put(0); 
else u = ws.get(); 


(b) Block diagram of the feedback network. 


module cnt-cmp 


ys.put (u); 
end 
J Il 
[ % reg node 
xv.write(0); 
loop 
v = xv.read(); zs.send(v); 
u = ys.get(); xv.write(u) ; 
end 
J Ii 
[ % inc node 
loop 
v = zs.get(); ws.put(vt1); 
end 


reg.init(0); 
[ % mux-cmp node 
loop 
v = xs.pres(); 
if v then reg.set(0); 
else u = ws.get(); 
reg.set (u); 
end 
J Il 
[ % inc-cmp node 
loop 
v = reg.get(); ws.put(v+1); 
end 


] 


J 


(c) Network with reg as a precompiled 
DCoL object. 


(a) Network with mux, reg, inc threads. 


Fig. 3. Synchronous data-flow network cnt built from control-flow processes. 


Now suppose, for modularity, the reg node is pre-compiled into a synchronous 
IO automaton to be used by mux and inc as a black box component. Then, reg 
must be split into three modes [20] reg.init, reg.get and reg.set that can 
be called independently in each instant. The init mode initialises the register 
memory with 0. The get mode extracts the buffered value and set stores a new 
value into the register. Since there may be data races if get and set are called 
concurrently on reg, a policy must be imposed. In the scheduling of Fig. 3b, 
first reg.get is executed to place the output on zs. Then, reg waits for mux to 
produce the next value of ys from xs or ws. Finally, reg.set is executed to store 
the current value of ys for the next tick. Thus, the natural policy for the register 
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is to require that in each tick set is called by at most one thread and if so no 
concurrent call to get by another thread happens afterwards. In addition, the 
policy requires init to take place at least once before any set or get. Hence, 
the policy has two states Preg = {0,1} with initial e = 0 and admissibility such 
that 0 IFreg |m iff m = init and 1 IFreg | m for all m. The transitions are 
0© init = 1 and 1O m = 1 for all m € Myeg. Further, for coherence, in state 1 
no set may be concurrent and every get must take place before any concurrent 
set. This means, we have 1 lFreg m — set for all m € {get, set}. Figure 3c 
shows the partially compiled code in which reg is treated as a compiled object. 
The policy on reg makes sure the accesses by mux and inc are scheduled in the 
right way (see Example 4). Note that reg is not an IVar because it has memory. 

The cnt example exhibits a general pattern found in the modular compilation 
of SP: Modules (here reg) may be exercised several times in a synchronous tick 
through modes which are executed in a specific prescribed order. Mode calls (here 
reg.set, reg.get) in the same module are coupled via common shared memory 
(here the local variable xs) while mode calls in distinct modules are isolated 
from each other [15,20]. 


3 Constructive Semantics of DCoL 


To formalise our semantics it is technically expedient to keep track of the com- 
pletion status of each active thread inside the program expression. This results in 
a syntax for processes distinguished from programs in that each parallel compo- 
sition Pi x, llko P2 is labelled by completion codes ki € {L,0,1} which indicate 
whether each thread is waiting k; = L, terminated 0 or pausing ki = 1. Since we 
remove a process from the parallel as soon as it terminates then the code k; = 0 
cannot occur. An expression P, || P2 is considered a special case of a process 
with k; = L. The formal semantics is given by a reduction relation on processes 


POE PS X Hy P' (2) 


specified by the inductive rules in Figs. 4 and 5. The relation (2) determines an 
instantaneous sequential reduction step of process P, called an sstep, that follows 
a sequence of enabled method calls m € M* in sequential program order in P. 
This does not include any context switches between concurrent threads inside 
P. For thread communication, several ssteps must be chained up, as described 
later. The sstep (2) results in an updated memory X” and residual process P’. 
The subscript k’ is a completion code, described below. The reduction (2) is 
performed in a context consisting of a global memory X € S (must context) 
containing the current state of all CSM variables and an environment predic- 
tion IT C M* (can context). The prediction records all potentially outstanding 
methods sequences from threads running concurrently with P. 

We write 7.(m) € MŽ for the projection of a method sequence m € M* to 
variable c and write 7, (JZ) for its lifting to sets of sequences. Prefixing is lifted, 
too, i.e., cm © H ={c.mm | m € IT} for any c.m € M. 
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Performing a method call c.m(v) in X; I advances the must context to 
X @c.m(v) but leaves IT unchanged. The sequence of methods m € M* in (2) is 
enabled in X; IT, written [X, IT] I- | m meaning that [(27.c)#,7.(ID)] lk. | t-(m) 
for all c € O. In this way, the context |X, J] forms a joint policy state for all 
variables for the TUC P, in the sense of Sect. 2 (Definition 3). 


Sequence 
Sc; PSO’, P’ k0 Sai 
ra 1 
XHP; QS DS! ty PG; 
X IH PZS XYP DIr QS D Hy Q Say 
3 2 
D3 P;Q ES M hyg 
Completion 
Cmp, z Cmp, 
X; I b- skip $ X Fo skip X; H F pause > X Fi pause 
Recursion 


3311+ P{recp. P/p} 5 X Hy P' 
X; I b recp. P = X' Hy P' 


Rec 


Fig. 4. SStep reductions for sequence, completion and recursion. 


Most of the rules in Figs. 4 and 5 should be straightforward for the reader 
familiar with structural operational semantics. Seq, is the case of a sequen- 
tial P;Q where P pauses or waits (k’ # 0) and Seq, is where P terminates 
and control passes into Q. The statements skip and pause are handled by rules 
Cmp, and Cmps. The rule Rec explains recursion rec p.P by syntactic unfolding 
of the recursion body P. All interaction with the memory takes place in the 
method calls let x = c.m(e) in P. Rule Let; is applicable when the method call 
is enabled, i.e., [2’, H] I- | c.m. Since processes are closed, the argument expres- 
sion e must evaluate, eval(e) = v, and we obtain the new memory X © c.m(v) 
and return value X.c.m(v). The return value is substituted for the local (stack 
allocated) identifier x, giving the continuation process P{2?.c.m(v)/a} which is 
run in the updated context X © c.m/(v); M. The prediction J remains the same. 
The second rule Let is used when the method call is blocked or the thread 
wants to wait and yield to the scheduler. The rules for conditionals Cnd,, Cnd2 
are canonical. More interesting are the rules Par,;—Par, for parallel composition, 
which implement non-deterministic thread switching. It is here where we need 
to generate predictions and pass them between the threads to exercise the policy 
control. 

The key operation is the computation of the can-prediction of a process P to 
obtain an over-approximation of the set of possible method sequences potentially 
executed by P. For compositionality we work with sequences can*(P) C M* x 
{0,1} stoppered with a completion code 0 if the sequence ends in termination or 


102 J. Aguado et al. 


Method Call 


[ST] Ik Lem eval(e)=v LOc.m(v); HH P{¥.c.m(v)/2} 2 DO! bp: P' 


cmm 


X; I F letz =c.m(e) in P == 2" Fy P' 


Let; 


Letz 
X; I b letz = c.m(e) inP + XF letz =c.m(e)inP 
Conditional 
== 4 ™ 7 j / 
eval(e)=true X; MHF P X FR P Crd; 
X; F if e then P else Q = E" ty P’ 
eval(e) = false X; IAQ 3 Y Fy Q 
H Cnd2 
X; + if e then P else Q = E" Fy Q 
Parallel 


S311 @ can(Q) PS ES’ PP k' #0 p 
SEP kliko Q S X Funko P' willig Q 


arı 


X; & can(Q) + P = X bo P’ 
X; FP rlro O S St Pig Q 
X; I @ can(P) QZ "Fy Qk’ #0 p 
DIFP kple Q B 2 Frprv P kpli Q' 
X: IT & can(P)+ Q B X bo Q’ 
X; IFP pplk Q B E" bep P 


Par2 


ar3 


Par4 


Fig. 5. SStep reductions for method calls, conditional and parallel. 


1 if it ends in pausing. The symbols Lo, Lı and T are the terminated, paused 
and fully unconstrained can contexts, respectively, with Lp = {(¢,0)}, Li = 
{(e,1)} and T = M* x {0,1}. The set can*(P), defined in Fig. 6, is extracted 
from the structure of P using prefixing c.m © I’, choice I ẹ M = I, U IE, 
parallel Ji @ IM and sequential composition I} - M4. Sequential composition is 
obtained pairwise on stoppered sequences such that (m, 0): (n, c) = (m n, c) and 
(m,1)-(,c) = (m,1). As a consequence, Lo: H = I’ and L- H” = 1. Parallel 
composition is pairwise free interleaving with synchronisation on completion 
codes. Specifically, a product (m,c) ® (n,d) generates all interleavings of m 
and n with a completion that models a parallel composition that terminates iff 
both threads terminate and pauses if one pauses. Formally, (m,c) @ (n,d) = 
{(c, maz(c,d)) | c E€ m@ n}. Thus, Hp 8 Ig = Lo iff Hp = Lo = Wg and 
Ip ® Hg = l; if Hp = Ly = Iio, or Ip = Lo and IIo = 14, or IT = li 
and Io = Lo. From can*(P) we obtain can(P) C M* by dropping all stopper 
codes, i.e., can(P) = {m | 3d. (m, d) € can*(P)}. 

The rule Par; exercises a parallel P xll ko Q by performing an sstep in P. This 
sstep is taken in the extended context X; I ® can(Q) in which the prediction 
of the sibling Q is added to the method prediction JI for the outer environment 
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can*(skip) = can*(p) = Lo can*(pause) = Lı 
can*(recp. P) = can*(P) can*(P || Q) = can? (P) @ can? (Q) 
an Pega fe P) if eae) C M* x {1} 
can? (P) - can°(Q) otherwise 
can? (let x = c.m(e) in P) = c.m © can? (P) 
can? (P) if eval(e) = true 
can? (if e then P else Q) = ¢ can*(Q) if eval(e) = false 
can (P) ® can°(Q) otherwise. 


Fig. 6. Computing the can prediction. 


in which the parent P || Q is running. In this way, Q can block method calls 
of P. When P finally yields as P’ with a non-terminating completion code, 
0 Æ k’ € {1,1}, the parallel completes as P’ wll ko Q with code k’ kg. This 
operation is defined kı Mkz = 1 if kı = 1 = kg and kıl k2 = L, otherwise. When 
P terminates its sstep with code k’ = 0 then we need rule Para which removes 
child P’ from the parallel composition. The rules Par3, Par, are symmetrical to 
Par,, Parg. They run the right child Q of a parallel P ķkpll k Q. 


Completion and Stability. A process P’ is 0-stable if P’ = skip and 1-stable if 
P’ = pause or P’ = P| ; P, and P] is 1-stable, or P’ = P{ ill1 P}, and P; are 1- 
stable. A process is stable if it is 0-stable or 1-stable. A process expression is well- 
formed if in each sub-expression Pi x, lIl k, P2 of P the completion annotations are 
matching with the processes, i.e., if k; Æ L then P; is kj-stable. Stable processes 
are well-formed by definition. For stable processes we define a (syntactic) tick 
function which steps a stable process to the next tick. It is defined such that 
o(skip) = skip, o(pause) = skip, o(P] ; Pj) = o(P{); P} and o(PI kllk 
Ph) = (PI) Il o( PS). 


Example 4. The data-flow cnt-cmp from Fig. 3c can be represented as a DCoL 
process in the form C = reg.init(0);(M ill. J) with 


M =a recp.v = xs. pres(); P(v); pause; p 
P(v) =af if v then reg.set(0);else Q 
Q =af u = ws.get(); reg.set(u); 
I =f recq. v = reg.get(); ws.put(v + 1); pause; q. 


Let us evaluate process C from an initialised memory Xo such that Xo.xs = 0 = 
Xo.ws, and empty environment prediction {e}. 

The first sstep is executed from the context Xo; {€} with empty can predic- 
tion. Note that reg.init(0);(M ill. J) abbreviates let _ = reg.init(0) in 
(M ill. I). In context Xo; {e} the method call reg.init(0) is enabled, i.e., 
[Xo, {e}] I- | reg.init. Since eval(0) = 0, we can execute the first method call 
of C using rule Let;. This advances the memory to Xı = Xo © reg.init(0). 
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The continuation process M | || , J is now executed in context 4; Lo. The left 
child M starts with method call xs. pres() and the right child J with reg.get(). 
The latter is admissible, since (X.reg)* = 1. Moreover, get does not need to 
honour any precedences, whence it is enabled, [X1, H] I- | reg.get for any IJ. 
On the other hand, xs. pres in M is enabled only if (X1.xs)# = 1 or if there 
is no concurrent emit predicted for xs. Indeed, this is the case: The concur- 
rent context of M is Hr = {e} ® can(I) = can(I) = {reg.get - ws.put}. We 
project Tys(Hr) = {e} and find [X1, Hr] I- | xs.pres. Hence, we have a non- 
deterministic choice to take an sstep in M or in J. Let us use rule Par, /Parg to 
run M in context X; Ir. By loop unfolding Rec and rule Let, we execute the 
present test of M which returns the value X1.xs.pres() = false. This leads 
to an updated memory Xə = X © xs. pres() = X, and continuation process 
P(false); pause; M. In this (right associated) sequential composition we first 
execute P(false) where the conditional rule Cndz switches to the else branch 
Q which is u = ws.get(); reg.set(u);, still in the context X2, Mr. The reading of 
the data-flow variable ws, however, is not enabled, [X2, Mr] F | ws.get, because 
(X2.ws)* = 0 and thus get not admissible. The sstep blocks with rule Letz: 


Leta 
Cnd2 


Izr Q => Sek. Q 
Xz; Hr + P(false) $ X2 H1 
Xə; I; + P(false); pause; M $ X2 F1 Q; pause; M 
“1; Hr H v = xs. pres(); P(v); pause; M $ Xə H1 Q; pause; M 
51; Hr M =*% X H1 Q; pause; M 
X {} H M ili T= XF, (Q;pause; M) ill I 
X; {ep} C Z yF] (Q; pause; M) | Il 


Seq; 


Letı (X1; Hr + | xs. pres) 
Rec 


Parı 
Letı (X; Lo I- | reg.init) 


where mı = reg.init and mz = xs. pres. In the next sstep, from X3; Ho with 
Io = {e} © can(Q; pause; M) = can(Q; pause; M) = {ws.get - reg.set} we let 
the process I execute its reg.get() with rules Rec and Let. The return value is 
v = X2.reg.get() = 0. Then, from the updated memory X3 = X2 © reg.get() 
we run the continuation process ws.put(0 + 1); pause; J. The ws.put is enabled 
if the [Var is empty and there is no concurrent put on ws predicted from M. 
Both conditions hold since (X3.ws)* = (X.ws)# = 0 and mys(I[g) = {get}. 
Therefore, [X3, Ig] I- | ws.put. With the evaluation eval(0 + 1) = 1 the rule 
Let; permits us to update the memory as X4 = X3 © ws.put(1) and continue 
with process pause; J which completes by pausing. Formally, this sstep is: 


X4; [Tg F pause = yk pause 
24; IQ F pause; I = yık pause; I 
X3; Io H ws.put(0 + 1); pause; I =$ X, F: pause; I 


Letz 


IT Let 

Xə; Io H v = reg.get(); ws.put(v + 1); pause; J = y+, pause; I SG 
Xz; Io H I 324 y, Fi pause; I p 
3 


Xz; {e} + (Q; pause; M) ill, J = y, F1 (Q;pause; M) 1llı (pause; T) 
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where m3 = reg.get and m4 = ws.put. In the next sstep the waiting method 
u = ws.get in Q is now admissible and can proceed, (X4.ws)* = ((X; © 
ws.put(1)).ws)* = 1 and thus [X4, H] I- | ws.get for all M. The return value 
is u = X4.ws.get() = 1, the updated memory X5 = X, © ws.put(1) and the 
continuation process reg.set(1); pause; M. The register set method is admissi- 
ble since (X4.reg)ř = 1 and also enabled because there is no get predicted in 
the concurrent environment Lo. Thus, [X’5, Lo] Ik | reg.set. The execution of 
the method yields the memory Xe = X; © reg.set(1) with continuation process 
pause ; M which completes by pausing. This yields the derivation tree: 


z Cmp, 
Xe; {€} F pause; M $ Xe Hı pause; M 


Xs; {e} + reg.set(1); pause; M =% Xe +1 pause; M 
Sa; {e} F Q; pause; M =S Xe Hı pause; M 
54; {€} + (Q; pause; M) 1 Ili (pause; I) =S Xe F1 (pause; M) ıllı (pause; 7) 


Let; 
Let; 


Paro 


where ms = ws.get and mg = reg.set. To justify the rule Parg consider that 
{e} 8 can(pause; I) = {e} Q {e} = {e}. At this point we have reached a 1-stable 
process. With the tick function we advance to the next tick, o((pause; M) ıllı 
(pause; /)) = (skip; M) 1 ll 1 (skip; 7) which behaves like M || 1 J. 


3.1 Determinacy, Termination and Constructiveness 


Determinacy of DCoL is a result of two components, monotonicity of policy- 
conformant scheduling and CSM coherence. Monotonicity ensures that whenever 
a method is executable and policy-enabled, then it remains policy-enabled under 
arbitrary ssteps of the environment. Symmetrically, the environment cannot be 
blocked by a thread taking policy-enabled computation steps. 

The second building block for determinacy is CSM variable coherence. Con- 
sider a context X; Io in which we run an sstep of P with prediction Ig for 
concurrent process Q, resulting in a final memory X’, arising from executing 
a sequence mp of method calls from P. Because of the policy constraint, the 
sequence m p must be enabled under all predictions n € Ig, i.e., [X, n] I- | mp. 
Suppose, on the other side, we sstep the process Q in the same memory X with 
prediction Jp for P, resulting in an action sequence mg and final memory XQ: 
Then, by the same reasoning, [X, n] I- | mg for all n € Ip. But since mp is an 
actual execution of P it must be in the prediction for P, i.e., mp € Hp and sym- 
metrically, mg € Hg. But then we have [Y, mg] IF | mp and [X, mp] I- | mp 
which means X |- mp © mg. Now if the semantics of method calls is policy- 
coherent then the Monotonicity can be exploited to derive a confluence property 
for processes which guarantees that mp can still be executed by P in state Xo 
and mg by Q in state X'p, and both lead to the same final memory. 
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Theorem 1 (Diamond Property). If all CSM variables are policy-coherent 
then the sstep semantics is confluent. Formally, given two derivations X; IT + 
P£ Site, Pi and X; I H P= X Fp, Po, Then, there exist X', k' and P' 
such that 51; H H Py =} X' Hy P' and 54; I+ Py S X Fy P. 


Theorem 1 shows that no matter how we schedule the ssteps of local threads 
to create an sstep of a parallel composition, the final result will not diverge. 
This does not guarantee completion of a process. However, it implies that the 
question of whether P blocks or makes progress does not depend on the order 
in which concurrent threads are scheduled. Either a process completes or it does 
not. All ssteps in a process can be scheduled with maximal parallelism without 
interference. 

A main program P is run at the top level in an “environmentally closed” form 
of ssteps (2) where the prediction is empty I = {e} and thus acts neutrally. We 
iterate such ssteps to construct a macro-step reaction. Let us write 


SEPSS+EP’ (3) 


if there exists k’, m such that X; Lo F P 5 X' Hy P’. The relation = is well- 
founded for clock-guarded processes in the sense that it has no infinite chains. 


Theorem 2 (Termination). Let P), P,, P2,... and Xo, X1, X2,... be infinite 
sequences of processes and memories, respectively, with X; F Pi > Xi+ı F Pi+1- 
If Po is clock-guarded then there is n > 0 with Xn = Xi, Pa = P; for alli >n. 


The fixed point semantics will iterate (3) until it reaches a P* such that 
3” H P* => X* + P*. By Termination Theorem 2 this must exist for clock- 
guarded processes. If can*(P*) = Lo then P* is 0-stable and the program P has 
terminated. If can*(P*) = L1, the residual P* is pausing. 


Definition 6 (Macro Step). A run X + P => 2’ F P' is a sequence of ssteps 
with processes Po, Pi, P2,...,P, and sequences of method calls mi, mz2,...™Mn 
such that for alll <i<n, 


Xri—1; Lo F Pi—1 > LiFe, Pi, 


where Py = P, Xo = X, Xn = &” and P, = P'. A run is called a macro-step 
if it is maximal, i.e., if X! į P! => X" H P" implies 3” = X" and P' = P". 
The macro-step is called stabilising if the final P’ is stable, i.e., kn A L and the 
clock is admissible, i.e., if (2".c)* ©o is defined for all c € O. The macro-step 
is pausing if kn = 1 and terminating if kn = 0. 


Given a pausing macro-step X H P => X + P’, then the next tick starts 
with process o(P') in memory X” such that (7¢(”))* —o— (me(X"))# for all 
c € O. This only constrains the abstract policy state of each variable in X” not 
their memory content. In this way, CSM variables can introduce an arbitrary new 
memory X” with every clock tick. 
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Theorem 3 (Macro-step Determinism). If all CSM variables are policy- 
coherent then for two macro steps X + P => Xı F Py and X F P => Xt P 
we have X1 = X and P) = Py. 


Definition 7 (Constructiveness). A program P is policy-constructive, for 
a set of policy coherent CSM variables, if for arbitrary initial memory X all 
reachable macro-steps of P are stabilising. 


A non-constructive program will, after some tick, end up in a fixed point 
P* with can*(P*) ¢ {Lo, Lı}. Then P* is stuck involving a set of active child 
threads waiting for each other in a policy-induced cycle. 

Finally, we present two important results for DCoL showing that we are 
conservatively extending existing SP semantics. A DCoL program using only 
sequentially constructive variables [14] (see [17] Sec. 5.7]) is called a DCoL- 
SC program. DCoL programs using only pure signals subject to the policy of 
Example 1 (Fig. 1) are expressive complete for the pure instantaneous fragment 
of Esterel [4]. Esterel signal emissions emit s are syntactic sugar for s. emit();. 
A presence test pres s then P else Q abbreviates if s.pres() then P else Q. 
Sequential composition P ; Q in Esterel behaves like a parallel composition in 
which the schedule is forced to run P to termination before it can pass control 
to Q. In DCoL this is (P;s’. emit();) I| (s’. pres() then Q else skip) with fresh 
signal s’ not occurring in either P or Q. This suggests the following definition: 
A program P is a (pure instantaneous) DCoL-Esterel program if (i) P only uses 
pure signals and (ii) P does not use pause or rec and (iii) P does not contain 
sequentially nested occurrences of signal accesses. 


Theorem 4 (Esterel and Sequential Constructiveness) 


1. If an DCoL-Esterel program P is policy-constructive according to Definition 7 
iff it is Berry-constructive in the sense of [4]. 

2. Ifa DCoL-SC program P is policy-constructive according to Definition 7 then 
it is sequentially constructive in the sense of [14]. 


It is interesting to note that the second statement in Theorem 4 is not 
invertible (for a counter example see [17]). Hence, policy-constructiveness for 
SC-variables induced by our semantics is more restrictive than that given in [14]. 


4 Related Work 


Many languages have been proposed to offer determinism as a fundamental 
design principle. We consider these attempts under several categories. 


Fixed Protocol for Shared Data. These approaches introduce an unique pro- 
tocol for data exchange between concurrent processes. SHIM [21] provides a 
model for combined hardware software systems typically of embedded systems. 
Here, the concurrent processes communicate using point-to-point (restricted) 
Kahn channels with blocking reads and writes. SHIM programs are shown to be 
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deterministic-by-construction as the states of each process are finite and deter- 
ministic and the data produced-consumed over any channel is also deterministic. 

Concurrent revisions [19] introduce a generic and deterministic programming 
model for parallel programming. This model supports fork-join parallelism and 
processes are allowed to make concurrent modifications to shared data by creat- 
ing local copies that are eventually merged using suitable (programmer specified) 
merge functions at join boundaries. 

However, like the deterministic SP model [2], both SHIM and concurrent revi- 
sions lack support for more expressive shared ADTs essential for programming 
complex systems. Caromel et al. [22], on the other hand, offer determinism with 
asynchronously communicating active objects (ADTs) equipped with a process 
calculus semantics. Here, an active object is a sequential thread. Active objects 
communicate using futures and synchronise via Kahn-MacQueen co-routines [23] 
for deterministic data exchange. Our approach subsumes Kahn buffers of SHIM 
and the local-copy-merge protocol of concurrent revisions by an appropriate 
choice of method interface and policy. None of these approaches [19,21,22] uses 
a clock as a central barrier mechanism like our approach does. 

In the Java-derived language X10, clocks are a form of synchronisation barrier 
for supporting deterministic and deadlock-free patterns of common parallel com- 
putations [24]. This allows multiple-clocks in contrast to our approach. These, 
however, are not abstracted in the objects in contrast to our clocks that are 
encapsulated inside the CSM types. Hence X10 clocks are invoked directly by the 
activities (i.e., concurrent threads) of programs and this manual synchronisation 
is as error-prone as other unsafe low-level primitives such as locks. 


Coherent Memory Models for Shared Data. Whether clocked or not, our approach 
depends on the availability of CSM types that are provably coherent for their 
policy. Besides the standard types of SP (data-flow, sequentially constructive 
variables, Kahn channels, signals) such CSM types can be obtained from exist- 
ing research on coherent memory models [25,26]. Unlike the protocol-oriented 
approaches above, some approaches have been developed based on coherency of 
the underlying memory models [26] especially for shared objects. 

Bocchino et al. [25] propose deterministic parallel Java (DPJ) which has a 
type and effect system to ensure that parallel heap accesses remain safe. Data 
structures such as arrays, trees, and sets can be accessed in parallel as long as 
accesses can be shown to use non-overlapping regions. 

Grace [27] promises a deterministic run-time through the adoption of fork-join 
parallelism combined with memory protection and a sequential commit protocol. 
However, there is no guarantee on the determinism of such custom synchronisation 
protocols. These must be verified using expensive proof systems. 

A powerful technique to generate coherent shared memory structure for func- 
tional programs has recently been proposed by Kuper et al. [28]. They introduce 
lattice-based data structures, called LVars, in which all write accesses produce 
a monotonic value increase in the lattice and all read accesses are blocked until 
the memory value has passed a read-specific threshold. Each variable’s domain 
is organised as a lattice of states with L and T representing an empty new 
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location and an error, respectively. Because of monotonicity all writes are con- 
fluent with each other. Since reads are blocked each LVar data type can thus be 
used in DCoL as a coherent CSM type of variables with a threshold-determined 
policy. Note that [25-28] do not consider CSM types and [28] also do not treat 
destructive sequential updates as we do. 

Recently Haller et al. [29] have developed Reactive Async, a new event-based 
asynchronous concurrent programming model that improves on LVars. This app- 
roach extends futures and promises with lattice-based operations in order to 
support destructive updates (refinement of results) in a deterministic concur- 
rent setting. The basic abstractions are: cells which define interfaces for reading 
a value that is asynchronously computed and (ii) cell completers that allow mul- 
tiple monotonic updates of values taken from a lattice type class. The model 
supports concurrent programming with cyclic data dependencies in contrast to 
LVars. The mechanism for resolving cycles combines the lattices with quiescence 
detection on a handler pool (execution context). The quiescence concept refers 
to a state where the cell values are not going to be changed anymore. The thread 
pool is able to detect this quiescent (synchronisation) phase and when this is the 
case the resolution of cyclic dependencies and reading of cells can take place. This 
is similar to our policies, where enabling of methods (e.g., read) is a state and 
prediction-dependent notion. Our developments may offer a theoretical back- 
ground for the cell interfaces of this model. In Reactive Async the concurrent 
code is guaranteed to be deterministic provided that the API is used appropri- 
ately but this is not checked statically. It would be interesting to investigate 
whether our theory can contribute on this front. In the other direction, Reac- 
tive Async manages inter-cell dependencies which might support global policies 
between different CSM variables in our setting. 


Clock-Driven Encapsulation. Encapsulation is not entirely unknown in reactive 
programming. The idea of reactive object model (ROM) [30] was first introduced 
by Boussinot et al. and subsequently refined [31] and combined with standards 
such as UML [32]. Here a program is a collection of reactive objects that operate 
synchronously relative to a global clock, similar to SP. Each object encapsulates 
a set of methods and data, where the methods share this data. ROM relied on a 
simplified assumption, where each method invocation is separated into instants. 

André etal. [33] generalised the ROM idea to that of synchronous objects, 
which behave like synchronous modules (in Esterel or Lustre). The program is 
divided into a collection of synchronous and standard objects. While the lat- 
ter interact using messages, the former use signals like in SP. Communication 
between standard and synchronous objects has to be managed using special 
interface objects. The framework supports features such as aggregation, encapsu- 
lation and inheritance yet communication is restricted to standard Esterel-style 
signals. However, the issue of determinism for the composition of synchronous 
objects with standard objects is not considered. 


8 A future can asynchronously be completed with a value of the appropriate type or 
it can fail with an exception. A promise allows completing a future at most once. 
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A concrete implementation of synchronous objects in Java is proposed in [34]. 
Here, a run-time system is used to provide a cyclic schedule of the objects during 
an instant. This approach assumes that outputs from the objects can be read 
only in the next instant (similar to the SL programming language [35]) and so 
does not support instantaneous communication like we do. 

Synchronous objects arise naturally in modular compilation [15,36,37]. The 
first time these have been exposed at the language level is in [20]. That work 
has inspired our use of policies. While [20] offers a mechanism for deterministic 
management of shared variables through ADT-like interfaces it has three seri- 
ous limitations: (1) Modes express data-flow equations rather than imperative 
method procedures and so are not directly suitable for control-flow programming; 
(2) Policies do not distinguish between two modes being called sequentially by 
the same thread, which can be permitted, and two methods being called by dif- 
ferent threads in parallel, which may have to be prohibited. This makes policies 
too restrictive in the light of the recent more liberal notion of sequential con- 
structiveness [14] and, most importantly, (3) the notion of policy-soundness does 
not use policies prescriptively as a contract to be fulfilled by the scheduler but 
instead only descriptively as an invariant of the program code. Hence, policies 
in [20] cannot be used to generalise the semantics of SP signals to shared ADTs. 

The sequentially constructive model of synchronous computation [14] has 
shown how the constructive semantics of Esterel can be reconstructed from a 
scheduling view as standard destructive variables plus synchronisation protocol. 
SCL acts as an intermediate language for the graphical language SCCharts [38] 
and the textual language SCEst [18] which are proposed as sequentially con- 
structive extensions of the well-known control-flow languages SyncCharts [39] 
and Esterel [4]. By presenting our new analysis of sequential constructiveness 
for SCL our results become applicable both for SCCharts and SCEst. 

The term ‘constructive’ semantics has been coined by Berry [4]. In [40] it was 
shown how it can be recoded as a fixed-point in an interval domain which we 
generalise here to policy states |u, y]. Talpin et al. [13] recently gave a construc- 
tive semantics of multi-clock synchronous programs. It is an open problem how 
our approach could be generalised to multiple clocks. 


5 Conclusion 


This work extends the SP theoretical foundations to allow communication at 
higher levels of abstraction. The paper explains deterministic concurrency of 
SP as a derived property from CSM types. Our results extend the SP-notion 
of constructiveness to general shared CSM types. We have made some simplify- 
ing assumptions that render the theory somewhat less general than it could be. 
A first limitation is our assumption that all method calls are atomic. We believe 
the theory can be generalised for non-atomic methods albeit at the price of 
a significant increase in the complexity of calculating can predictions. Second, 
method parameters are passed “by value” rather than “by reference”. This is 
necessary for having types as black boxes ready to use. Method parameters 
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passing variables “by reference” would also introduce aliasing issues which we 
do not address. Third, in our present setting the policy update u © m does not 
observe method parameters. This is an abstraction to facilitate static analyses. 
In principle, to increase expressiveness, the method parameters could be 
included, too, but again complicate over-approximation for can information. 
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Abstract. We present ELLORA, a sound and relatively complete 
assertion-based program logic, and demonstrate its expressivity by veri- 
fying several classical examples of randomized algorithms using an imple- 
mentation in the EASYCRYPT proof assistant. ELLORA features new proof 
rules for loops and adversarial code, and supports richer assertions than 
existing program logics. We also show that ELLORA allows convenient 
reasoning about complex probabilistic concepts by developing a new pro- 
gram logic for probabilistic independence and distribution law, and then 
smoothly embedding it into ELLORA. 


1 Introduction 


The most mature systems for deductive verification of randomized algorithms are 
expectation-based techniques; seminal examples include PPDL [28] and PGCL 
[34]. These approaches reason about expectations, functions E from states to real 
numbers, propagating them backwards through a program until they are trans- 
formed into a mathematical function of the input. Expectation-based systems 
are both theoretically elegant [16,23, 24,35] and practically useful; implementa- 
tions have verified numerous randomized algorithms [19,21]. However, proper- 
ties involving multiple probabilities or expected values can be cumbersome to 
verify—each expectation must be analyzed separately. 

An alternative approach envisioned by Ramshaw [37] is to work with predi- 
cates over distributions. A direct comparison with expectation-based techniques 


This is the conference version of the paper. 
1 Treating a program as a function from input states s to output distributions u(s), 
the expected value of Æ on p(s) is an expectation. 
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is difficult, as the approaches are quite different. In broad strokes, assertion- 
based systems can verify richer properties in one shot and have specifications 
that are arguably more intuitive, especially for reasoning about loops, while 
expectation-based approaches can transform expectations mechanically and can 
reason about non-determinism. However, the comparison is not very meaningful 
for an even simpler reason: existing assertion-based systems such as [8, 18,38] 
are not as well developed as their expectation-based counterparts. 


Restrictive Assertions. Existing probabilistic program logics do not support 
reasoning about expected values, only probabilities. As a result, many prop- 
erties about average-case behavior are not even expressible. 

Inconvenient Reasoning for Loops. The Hoare logic rule for determin- 
istic loops does not directly generalize to probabilistic programs. Existing 
assertion-based systems either forbid loops, or impose complex semantic side 
conditions to control which assertions can be used as loop invariants. Such 
side conditions are restrictive and difficult to establish. 

No Support for External or Adversarial Code. A strength of expectation- 
based techniques is reasoning about programs combining probabilities and 
non-determinism. In contrast, Morgan and McIver [30] argue that assertion- 
based techniques cannot support compositional reasoning for such a combi- 
nation. For many applications, including cryptography, we would still like to 
reason about a commonly-encountered special case: programs using external 
or adversarial code. Many security properties in cryptography boil down to 
analyzing such programs, but existing program logics do not support adver- 
sarial code. 

Few Concrete Implementations. There are by now several independent 
implementations of expectation-based techniques, capable of verifying inter- 
esting probabilistic programs. In contrast, there are only scattered implemen- 
tations of probabilistic program logics. 


These limitations raise two points. Compared to expectation-based approaches: 


1. Can assertion-based approaches achieve similar expressivity? 
2. Are there situations where assertion-based approaches are more suitable? 


In this paper, we give positive evidence for both of these points.” Towards the 
first point, we give a new assertion-based logic ELLORA for probabilistic pro- 
grams, overcoming limitations in existing probabilistic program logics. ELLORA 
supports a rich set of assertions that can express concepts like expected values 
and probabilistic independence, and novel proof rules for verifying loops and 
adversarial code. We prove that ELLORA is sound and relatively complete. 
Towards the second point, we evaluate ELLORA in two ways. First, we 
define a new logic for proving probabilistic independence and distribution law 


2 Note that we do not give mathematically precise formulations of these points; as 
we are interested in the practical verification of probabilistic programs, a purely 
theoretical answer would not address our concerns. 
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properties—which are difficult to capture with expectation-based approaches— 
and then embed it into ELLORA. This sub-logic is more narrowly focused 
than ELLORA, but supports more concise reasoning for the target assertions. 
Our embedding demonstrates that the assertion-based approach can be flexi- 
bly integrated with intuitive, special-purpose reasoning principles. To further 
support this claim, we also provide an embedding of the Union Bound logic, 
a program logic for reasoning about accuracy bounds [4]. Then, we develop a 
full-featured implementation of ELLORA in the EASYCRYPT theorem prover and 
exercise the logic by mechanically verifying a series of complex randomized algo- 
rithms. Our results suggest that the assertion-based approach can indeed be 
practically viable. 


Abstract Logic. To ease the presentation, we present ELLORA in two stages. 
First, we consider an abstract version of the logic where assertions are general 
predicates over distributions, with no compact syntax. Our abstract logic makes 
two contributions: reasoning for loops, and for adversarial code. 


Reasoning About Loops. Proving a property of a probabilistic loop typically 
requires establishing a loop invariant, but the class of loop invariants that can 
be soundly used depends on the termination behavior—stronger termination 
assumptions allows richer loop invariants. We identify three classes of assertions 
that can be used for reasoning about probabilistic loops, and provide a proof 
rule for each one: 


— arbitrary assertions for certainly terminating loops, i.e. loops that terminate 
in a finite amount of iterations; 

— topologically closed assertions for almost surely terminating loops, i.e. loops 
terminating with probability 1; 

— downwards closed assertions for arbitrary loops. 


The definition of topologically closed assertion is reminiscent of Ramshaw [37]; 
the stronger notion of downwards closed assertion appears to be new. 

Besides broadening the class of loops that can be analyzed, our rules often 
enable simpler proofs. For instance, if the loop is certainly terminating, then 
there is no need to prove semantic side-conditions. Likewise, there is no need to 
consider the termination behavior of the loop when the invariant is downwards 
and topologically closed. For example, in many applications in cryptography, 
the target property is that a “bad” event has low probability: Pr [|E] < k. In our 
framework this assertion is downwards and topologically closed, so it can be a 
loop invariant regardless of the termination behavior. 


Reasoning About Adversaries. Existing assertion-based logics cannot reason 
about probabilistic programs with adversarial code. Adversaries are special 
probabilistic procedures consisting of an interface listing the concrete proce- 
dures that an adversary can call (oracles), along with restrictions like how many 
calls an adversary may make. Adversaries are useful in cryptography, where secu- 
rity notions are described using experiments in which adversaries interact with a 
challenger, and in game theory and mechanism design, where adversaries can rep- 
resent strategic agents. Adversaries can also model inputs to online algorithms. 
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We provide proof rules for reasoning about adversary calls. Our rules are 
significantly more general than previously considered rules for reasoning about 
adversaries. For instance, the rule for adversary used by [4] is restricted to adver- 
saries that cannot make oracle calls. 


Metatheory. We show soundness and relative completeness of the core abstract 
logic, with mechanized proofs in the COQ proof assistant. 


Concrete Logic. While the abstract logic is conceptually clean, it is inconve- 
nient for practical formal verification—the assertions are too general and the 
rules involve semantic side-conditions. To address these issues, we flesh out a 
concrete version of ELLORA. Assertions are described by a grammar model- 
ing a two-level assertion language. The first level contains state predicates— 
deterministic assertions about a single memory—while the second layer contains 
probabilistic predicates constructed from probabilities and expected values over 
discrete distributions. While the concrete assertions are theoretically less expres- 
sive than their counterparts in the abstract logic, they can already encode com- 
mon properties and notions from existing proofs, like probabilities, expected val- 
ues, distribution laws and probabilistic independence. Our assertions can express 
theorems from probability theory, enabling sophisticated reasoning about prob- 
abilistic concepts. 
Furthermore, we leverage the concrete syntax to simplify verification. 


— We develop an automated procedure for generating pre-conditions of non- 
looping commands, inspired by expectation-based systems. 

— We give syntactic conditions for the closedness and termination properties 
required for soundness of the loop rules. 


Implementation and Case Studies. We implement ELLORA on top of EAsy- 
CRYPT, a general-purpose proof assistant for reasoning about probabilistic pro- 
grams, and we mechanically verify a diverse collection of examples including 
textbook algorithms and a randomized routing procedure. We develop an EASY- 
CryPT formalization of probability theory from the ground up, including tools 
like concentration bounds (e.g., the Chernoff bound), Markov’s inequality, and 
theorems about probabilistic independence. 


Embeddings. We propose a simple program logic for proving probabilistic inde- 
pendence. This logic is designed to reason about independence in a lightweight 
way, as is common in paper proofs. We prove that the logic can be embedded 
into ELLORA, and is therefore sound. Furthermore, we prove an embedding of 
the Union Bound logic [4]. 
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2 Mathematical Preliminaries 


As is standard, we will model randomized computations using sub-distributions. 


Definition 1. A sub-distribution over a set A is defined by a mass function 
u : A — [0,1] that gives the probability of the unitary events a € A. This 
mass function must be s.t. X` „c4 Wa) is well-defined and |u| = Saca pla) Ss 1. 


In particular, the support supp(1) = {a € A | pla) £0} is discrete.? The name 
“sub-distribution” emphasizes that the total probability may be strictly less than 
1. When the weight |u| is equal to 1, we call u a distribution. We let SDist(A) 
denote the set of sub-distributions over A. The probability of an event E(x) w.r.t. 
a sub-distribution u, written Prz~,[E(x)], is defined as X sc AjE(œ) UCT). 


Simple examples of sub-distributions include the null sub-distribution 0, 
which maps each element of the underlying space to 0; and the Dirac distribution 
centered on x, written 6,, which maps x to 1 and all other elements to 0. The 
following standard construction gives a monadic structure to sub-distributions. 


Definition 2. Let u € SDist(A) and f : A — SDist(B). Then E,n~,[f] € 
SDist(B) is defined by 


tanul f1) = >> ula) - f(a) (0). 


acA 


We use notation reminiscent of expected values, as the definition is quite similar. 
We will need two constructions to model branching statements. 


Definition 3. Let pı, y2 € SDist(A) such that |u| + |u2| < 1. Then pı + po is 
the sub-distribution u such that u(a) = mı (a) + u2(a) for every a € A. 


Definition 4. Let E C A and u € SDist(A). Then the restriction up of u to 
E is the sub-distribution such that pp (a) = u(a) ifa € E and 0 otherwise. 


Sub-distributions are partially ordered under the pointwise order. 


Definition 5. Let 11, u2 E€ SDist(A). We say pı < pe if pla) < uola) for 
every a € A, and we say pı = u2 if pı(a) = pola) for every a € A. 


We use the following lemma when reasoning about the semantics of loops. 
Lemma 1. Jf pı < u2 and |u| = 1, then pı = u2 and |p2| = 1. 


Sub-distributions are stable under pointwise-limits. 


3 We work with discrete distributions to keep measure-theoretic technicalities to a 
minimum, though we do not see obstacles to generalizing to the continuous setting. 
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Definition 6. A sequence (un)nen € SDist(A) sub-distributions converges if 
for every a € A, the sequence (Un(a))nen of real numbers converges. The limit 
sub-distribution is defined as 


Hoo (a) = lim Hn (a) 


n— oo 
for everya E€ A. We write limp —oo Hn for Hoo- 


Lemma 2. Let (fin)nen be a convergent sequence of sub-distributions. Then for 
any event E(x), we have: 
Yn é€N. Pr [E(x)] = lim Pr [E(x)]. 
T~ Hoo NOOO TNn 


Any bounded increasing real sequence has a limit; the same is true of sub- 
distributions. 


Lemma 3. Let (tin)nen € SDist(A) be an increasing sequence of sub- 
distributions. Then, this sequence converges to Uo and Un < Moo for every 
n € N. In particular, for any event E, we have Praxp,[E] < Precp.[E] for 
every NEN. 


3 Programs and Assertions 


Now, we introduce our core programming language and its denotational 
semantics. 


Programs. We base our development on PWHILE, a strongly-typed imperative 
language with deterministic assignments, probabilistic assignments, conditionals, 
loops, and an abort statement which halts the computation with no result. 
Probabilistic assignments x  g assign a value sampled from a distribution g to 
a program variable x. The syntax of statements is defined by the grammar: 


s ::= skip | abort | x — e | x ++ g | s;s 


| if e then s else s | while e do s | x — Z (e) | x — A(e) 


where x, e, and g range over typed variables in ¥, expressions in € and dis- 
tribution expressions in D respectively. The set € of well-typed expressions is 
defined inductively from ¥ and a set F of function symbols, while the set D 
of well-typed distribution expressions is defined by combining a set of distribu- 
tion symbols S with expressions in €. Programs may call a set Z of internal 
procedures as well as a set A of external procedures. We assume that we have 
code for internal procedures but not for external procedures—we only know 
indirect information, like which internal procedures they may call. Borrowing a 
convention from cryptography, we call internal procedures oracles and external 
procedures adversaries. 


Semantics. The denotational semantics of programs is adapted from the seminal 
work of [27] and interprets programs as sub-distribution transformers. We view 
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[skip]m = ôm 
[abort] = 0 
[x © elm = Smfe:=felm] 
[e & glm = Euv tolm [mie:=v)] 
[$15 $2]m = Em’~{s1Im [s2]m] 
[if e then sı else s2]m = if [e]m then [si]m else [s2]m 
[while e dos] = lim [(if ethen s)”; if e then abort], 


noo 


[z < T(e) m= farg — e; fooay; £ = fres|m 


[x <= A(e) m = ||@arg <— €; Abody; © <— Gres||m 


[s]u = Em~y[[s]on] 


Fig. 1. Denotational semantics of programs 


states as type-preserving mappings from variables to values; we write State for 
the set of states and SDist(State) for the set of probabilistic states. For each 
procedure name f € ZU A, we assume a set Vf C ¥ of local variables s.t. XF 
are pairwise disjoint. The other variables ¥ \ U fx ‘3 are global variables. 

To define the interpretation of expressions and distribution expressions, we 
let [e]m denote the interpretation of expression e with respect to state m, and 
le], denote the interpretation of expression e with respect to an initial sub- 
distribution jz over states defined by the clause [e], = Em~[[e]m]- Likewise, 
we define the semantics of commands in two stages: first interpreted in a single 
input memory, then interpreted in an input sub-distribution over memories. 


Definition 7. The semantics of commands are given in Fig. 1. 


— The semantics [s]m of a statement s in initial state m is a sub-distribution 
over states. 

- The (lifted) semantics |s], of a statement s in initial sub-distribution u over 
states is a sub-distribution over states. 


We briefly comment on loops. The semantics of a loop while e do c is defined 
as the limit of its lower approximations, where the n-th lower approximation 
of [while e do c], is [(if e then s)”;if e then abort], where if e then s is 
shorthand for if e then s else skip and c” is the n-fold composition c;--- ;c. 
Since the sequence is increasing, the limit is well-defined by Lemma 3. In con- 
trast, the n-th approximation of [while e do c],, defined by [(if e then s)"],, 
may not converge, since they are not necessarily increasing. However, in the 
special case where the output distribution has weight 1, the n-th lower approxi- 
mations and the n-th approximations have the same limit. 
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Lemma 4. If the sub-distribution [while e do c], has weight 1, then the limit 
of [(if e then s)"],, is defined and 


lim [(if e then s)";if e then abort], = lim [(if e then s)"],,. 

This follows by Lemma 1, since lower approximations are below approxima- 
tions so the limit of their weights (and the weight of their limit) is 1. It will be 
useful to identify programs that terminate with probability 1. 


Definition 8 (Lossless). A statement s is lossless if for every sub-distribution 
u, |[s]u| = |u|; where |u| is the total probability of u. Programs that are not 
lossless are called lossy. 


Informally, a program is lossless if all probabilistic assignments sample from 
full distributions rather than sub-distributions, there are no abort instructions, 
and the program is almost surely terminating, i.e. infinite traces have probability 
zero. Note that if we restrict the language to sample from full distributions, then 
losslessness coincides with almost sure termination. 

Another important class of loops are loops with a uniform upper bound on 
the number of iterations. Formally, we say that a loop while e do s is certainly 
terminating if there exists k such that for every sub-distribution u, we have 
|[while e do s],,| = |[(if e then s)*],,|. Note that certain termination of a loop 
does not entail losslessness—the output distribution of the loop may not have 
weight 1, for instance, if the loop samples from a sub-distribution or if the loop 
aborts with positive probability. 


Semantics of Procedure Calls and Adversaries. The semantics of internal pro- 
cedure calls is straightforward. Associated to each procedure name f € T, we 
assume a designated input variable farg € XF, a piece of code fpoay that exe- 
cutes the function call, and a result expression fres. A function call z — T(e) 
is then equivalent to farg — €; fboday; £ — fres- Procedures are subject to well- 
formedness criteria: procedures should only use local variables in their scope and 
after initializing them, and should not perform recursive calls. 

External procedure calls, also known as adversary calls, are a bit more 
involved. Each name a € A is parametrized by a set doc) © Z of internal pro- 
cedures which the adversary may call, a designated input variable Garg € a 
a (unspecified) piece of code apoay that executes the function call, and a result 
expression dyes. We assume that adversarial code can only access its local vari- 
ables in X* and can only make calls to procedures in doc. It is possible to impose 
more restrictions on adversaries—say, that they are lossless—but for simplicity 
we do not impose additional assumptions on adversaries here. 


4 Proof System 


In this section we introduce a program logic for proving properties of proba- 
bilistic programs. The logic is abstract—assertions are arbitrary predicates on 
sub-distributions—but the meta-theoretic properties are clearest in this setting. 
In the following section, we will give a concrete version suitable for practical use. 
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Assertions and Closedness Conditions. We use predicates on state distribution. 


Definition 9 (Assertions). The set Assn of assertions is defined as 
P(SDist(State)). We write n(u) for p € 7. 


Usual set operations are lifted to assertions using their logical counterparts, 
eg, nAn Ê nN y and =y = 7. Our program logic uses a few additional 
constructions. Given a predicate @ over states, we define 


olu) =Vm.m € supp(H) => $(m) 


where supp(j) is the set of all states with non-zero probability under u. Intu- 
itively, @ holds deterministically on all states that we may sample from the 
distribution. To reason about branching commands, given two assertions 7, and 
72, we let 


I> 


(m ® n2)(u) = Apa, p2 . p = pi + p2 Am (u) A nlu). 


This assertion means that the sub-distribution is the sum of two sub- 
distributions such that ņı holds on the first piece and 72 holds on the second 
piece. 

Given an assertion 7 and an event E C State, we let 7) p(x) 2 n(ujg). This 
assertion holds exactly when ņ is true on the portion of the sub-distribution 
satisfying E. Finally, given an assertion 7 and a function F from SDist(State) 
to SDist(State), we define n|F] = Au. n(F(u)). Intuitively, n|F] is true in a 
sub-distribution u exactly when ņ holds on F(u). 

Now, we can define the closedness properties of assertions. These properties 
will be critical to our rules for while loops. 


Definition 10 (Closedness properties). A family of assertions (Mn)neNneo is: 


— u-closed if for every increasing sequence of sub-distributions (un)nen such 
that n(n) for alln EN then no (limn—oo Hn); 

- t-closed if for every converging sequence of sub-distributions (Un)nen such 
that m (un) for alln EN then noo (litn—oo Hn); 

- d-closed if it is t-closed and downward closed, that is for every sub- 
distributions u < WW, Nolu) implies too (pH). 


When (nn)n is constant and equal to n, we say that n is u-/t-/d-closed. 


Note that t-closedness implies u-closedness, but the converse does not hold. 
Moreover, u-closed, t-closed and d-closed assertions are closed under arbitrary 
intersections and finite unions, or in logical terms under finite boolean combina- 
tions, universal quantification over arbitrary sets and existential quantification 
over finite sets. 

Finally, we introduce the necessary machinery for the frame rule. The set 
mod(s) of modified variables of a statement s consists of all the variables on the 
left of a deterministic or probabilistic assignment. In this setting, we say that 
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an assertion 7 is separated from a set of variables X, written separated(7, X), if 
n(uı) => n(u2) for any distributions u1, p2 s.t. |u| = |u2| and piggy = ux 
where for a set of variables X, the restricted sub-distribution ux is 
hix :m E State, Pr [m= mx] 
m!~ 

where State, x and m)x restrict State and m to the variables in X. 

Intuitively, an assertion is separated from a set of variables X if every two 
sub-distributions that agree on the variables outside X either both satisfy the 
assertion, or both refute the assertion. 


Judgments and Proof Rules. Judgments are of the form {7} s{n’'}, where the 
assertions 7 and 7’ are drawn from Assn. 


Definition 11. A judgment {n} s {n} is valid, written = {n} s {n}, if n'([s],) 
for every interpretation of adversarial procedures and every probabilistic state u 
such that y(t). 


Figure 2 describes the structural and basic rules of the proof system. Valid- 
ity of judgments is preserved under standard structural rules, like the rule of 
consequence [CONSEQ]. As usual, the rule of consequence allows to weaken the 
post-condition and to strengthen the post-condition; in our system, this rule 
serves as the interface between the program logic and mathematical theorems 
from probability theory. The [EXISTS] rule is helpful to deal with existentially 
quantified pre-conditions. 


no => m {m} s {n2} n2 = N3 | Va: T. {n} s {n} 
Iaret ee n T 
n Ê nife < e]] 
{n} abort {01} [AES {7} x + e {n} (Ama {n} skip {n} IPREN 
n Ê nije & gl] {no} sı {m} {m} s2 {n2} 
{n} x & g {n} ee {no} $1; s2 {n2} [Sea] 
{m AOe} si {m} {n2 AD} s2 {n2} ossi 


{(m ^ Oe) ® (n2 \O-e)} if e then sı else s2 {n1 © n2} 


{m}s{m} {m2} s {m} 
{m © n2} s {11 © n2} 


[SPLIT] 


separated(7, mod(s)) s is lossless 


{n} s {n} 


{n} farg < e; fooday {n [[x + fres]]} 
{n} z + f(e) {n'} 


[FRAME] 


[CALL] 


Fig. 2. Structural and basic rules 
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The rules for skip, assignments, random samplings and sequences are all 
straightforward. The rule for abort requires OL to hold after execution; this 
assertion uniquely characterizes the resulting null sub-distribution. The rules for 
assignments and random samplings are semantical. 

The rule [COND] for conditionals requires that the post-condition must be 
of the form 7, © mz; this reflects the semantics of conditionals, which splits 
the initial probabilistic state depending on the guard, runs both branches, and 
recombines the resulting two probabilistic states. 

The next two rules ({SPLIT] and [FRAME]) are useful for local reasoning. The 
[SPLIT] rule reflects the additivity of the semantics and combines the pre- and 
post-conditions using the ® operator. The [FRAME] rule asserts that lossless 
statements preserve assertions that are not influenced by modified variables. 

The rule [CALL] for internal procedures is as expected, replacing the proce- 
dure call f with its definition. 

Figure3 presents the rules for loops. We consider four rules specialized to 
the termination behavior. The [WHILE] rule is the most general rule, as it deals 
with arbitrary loops. For simplicity, we explain the rule in the special case where 
the family of assertions is constant, i.e. we have în = 7 and 7), = 7’. Informally, 
the 7 is the loop invariant and 7’ is an auxiliary assertion used to prove the 
invariant. We require that 7 is u-closed, since the semantics of a loop is defined 
as the limit of its lower approximations. Moreover, the first premise ensures 
that starting from 7, one guarded iteration of the loop establishes n’; the second 
premise ensures that restricting to se a probabilistic state u’ satisfying 7’ yields a 
probabilistic state jz satisfying 7. It is possible to give an alternative formulation 
where the second premise is substituted by the logical constraint N'ae => n. 
As usual, the post-condition of the loop is the conjunction of the invariant with 
the negation of the guard (more precisely in our setting, that the guard has 
probability 0). 

The [WHILE-AST] rule deals with lossless loops. For simplicity, we explain 
the rule in the special case where the family of assertions is constant, i.e. we have 
Nn = 7. In this case, we know that lower approximations and approximations 
have the same limit, so we can directly prove an invariant that holds after one 
guarded iteration of the loop. On the other hand, we must now require that the 
7 satisfies the stronger property of t-closedness. 

The [WHILE-D] rule handles arbitrary loops with a d-closed invariant; intu- 
itively, restricting a sub-distribution that satisfies a downwards closed assertion 
7 yields a sub-distribution which also satisfies n. 

The [WHILE-CT] rule deals with certainly terminating loops. In this case, 
there is no requirement on the assertions. 

We briefly compare the rules from a verification perspective. If the assertion 
is d-closed, then the rule [WHILE-D] is easier to use, since there is no need 
to prove any termination requirement. Alternatively, if we can prove certain 
termination of the loop, then the rule [WHILE-CT] is the best to use since it 
does not impose any condition on assertions. When the loop is lossless, there is 
no need to introduce an auxiliary assertion 7, which simplifies the proof goal. 


128 G. Barthe et al. 


uclosed((1;,) nen ) 
Vn. {nn } if e then s {1n+41} Vn. {nn} if e then abort {nn} 


WHILE 
{no} while e do s {n,, A O-e} 
tclosed((17n )nen~ ) Vn. {rn} if e then s {nn+1} 
Yu. no =. hile ed =1 
LL. no(n) |[(while e do s)],| [WuiLe-AST] 
{no} while e do s {ns ^A O-e} 
dclosed (rin nen) Vn. {nn} if e then s {m41} Di 
{no} while e do s {n> ^ D-7e} 
Yn. {m} if e then s {nn+1} 
Vu. nolu) => [(if e then s)"],, = [(while e do s)], [WuHILE-CT] 
{no} while e do s {n} A De} 
Fig. 3. Rules for loops 
Yn € N”. separated(7n, {x, 5}) dclosed((1)n )nenc ) 
Vif Qocl, T te e E, n N. {m} T fle) Tma} [Apv] 


{no} x + a(e) {næ} 


Fig. 4. Rules for adversaries 


Note however that it might still be beneficial to use the [WHILE] rule, even for 
lossless loops, because of the weaker requirement that the invariant is u-closed 
rather than t-closed. 


Finally, Fig. 4 gives the adversary rule for general adversaries. It is highly 
similar to the general rule [WHILE-D] for loops since the adversary may make 
an arbitrary sequence of calls to the oracles in aoc] and may not be lossless. 
Intuitively, 7 plays the role of the invariant: it must be d-closed and it must be 
preserved by every oracle call with arbitrary arguments. If this holds, then 7 
is also preserved by the adversary call. Some framing conditions are required, 
similar to the ones of the [FRAME] rule: the invariant must not be influenced by 
the state writable by the external procedures. 

It is possible to give other variants of the adversary rule with more gen- 
eral invariants by restricting the adversary, e.g., requiring losslessness or bound- 
ing the number of calls the external procedure can make to oracles, leading to 
rules akin to the almost surely terminating and certainly terminating loop rules, 
respectively. 

Soundness and Relative Completeness. Our proof system is sound with respect 
to the semantics. 
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Theorem 1 (Soundness). Every judgment {n}s{n'} provable using the rules 
of our logic is valid. 


Completeness of the logic follows from the next lemma, whose proof makes 
an essential use of the [WHILE] rule. In the sequel, we use 1, to denote the 
characteristic function of a probabilistic state u, an assertion stating that the 
current state is equal to p. 

Lemma 5. For every probabilistic state u, the following judgment is provable 
using the rule of the logic: 
{1} s {1 po], }- 


Proof. By induction on the structure of s. 
— s = abort, s = skip, x — e and s = g & g are trivial; 
— s = S51; S2, we have to prove 


{1a} 51; 52 {L s2]ts11, ie 
We apply the [SEQ] rule with 7, = 1].,),, premises can be directly proved 
using the induction hypothesis; 
— s = İf e then sı else s2, we have to prove 


{1, } if e then sı else s2 {Asla ® Lisala): 
We apply the [CONSEQ] rule to be able to apply the [COND] rule with , = 
Asi], and 12 = Uso] a) Both premises can be proved by an application of 
the [CONSEQ] rule followed by the application of the induction hypothesis. 
— s=while e do s, we have to prove 
ta} while e do s {Limno [Gf e then s)”;if e then abort], }- 


We first apply the [WHILE] rule with 77, = 1c e then s)”], and 


In = Lye e then s)”;if e then abort],,- 


For the first premise we apply the same process as for the conditional case: we 
apply the [CONSEQ] and [COND] rules and we conclude using the induction 
hypothesis (and the [SKIP] rule). For the second premise we follow the same 
process but we conclude using the [ABORT] rule instead of the induction 
hypothesis. Finally we conclude since uclosed((1n) nen )- 


The abstract logic is also relatively complete. This property will be less 
important for our purposes, but it serves as a basic sanity check. 


Theorem 2 (Relative completeness). Every valid judgment is derivable. 
Proof. Consider a valid judgment {7}s{n'}. Let u be a probabilistic state such 
that 7(u). By the above proposition, {1,,}s{1j.},,}- Using the validity of the 


judgment and [CONSEQ], we have {1, A n(w)}s{n'}. Using the [ExIsTs] and 
[CONSEQ] rules, we conclude {7}s{n'} as required. 


The side-conditions in the loop rules (e.g., uclosed/tclosed/dclosed and the 
weight conditions) are difficult to prove, since they are semantic properties. Next, 
we present a concrete version of the logic with give easy-to-check, syntactic 
sufficient conditions. 
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5 A Concrete Program Logic 


To give a more practical version of the logic, we begin by setting a concrete 
syntax for assertions 


Assertions. We use a two-level assertion language, presented in Fig.5. A prob- 
abilistic assertion 7 is a formula built from comparison of probabilistic expres- 
sions, using first-order quantifiers and connectives, and the special connective ®. 
A probabilistic expression p can be a logical variable v, an operator applied to 
probabilistic expressions o(p) (constants are 0-ary operators), or the expectation 
i[é] of a state expression č. A state expression č is either a program variable 
x, the characteristic function 14 of a state assertion ¢, an operator applied to 
state expressions o(é), or the expectation E,,.,[é] of state expression é in a given 
distribution g. Finally, a state assertion ¢ is a first-order formula over program 
variables. Note that the set of operators is left unspecified but we assume that 
all the expressions in € and D can be encoded by operators. 

The interpretation of the con- 


a 7 | d Le | Evwglé] | o(E) (S-expr.) crete syntax is as expected. The 
$ ::= č ë | FO(¢) (S-assn.) interpretation of probabilistic asser- 
p := v | o(p) | Ele (P-expr.) tions is relative to a valuation p 


n::=pxp|nen ]|FOln) (P-assn.) which maps logical variables to val- 
ues, and is an element of Assn. The 
definition of the interpretation is 
, ; straightforward; the only interesting 

Fig. 5. Assertion syntax case is [E[éJ? which is defined by 
Em~pllEl?,], where [é]%, is the interpretation of the state expression é in the 
memory m and valuation p. The interpretation of state expressions is a map- 
ping from memories to values, which can be lifted to a mapping from dis- 
tributions over memories to distributions over values. The definition of the 
interpretation is sen aro ma the most interesting case is for expectation 


bd E {=,<,<} o € Ops (Ops.) 


Evn] = Eungoye, (faye — wh We present the full interpretations in the 
supplemental materials. 

Many standard concepts from probability theory have a natural representa- 
tion in our syntax. For example: 


— the probability that @ holds in some probabilistic state is represented by the 
probabilistic expression Pr[¢] = E[14]; 

— probabilistic independence of state expressions ĉ1, ..., Čn is modeled by the 
probabilistic assertion #{é1,...,é€n}, defined by the clause* 


Vous. 8, PrfT |? Pr] [A é; = vj] = [I Pr[é; = vil; 


$= 125.7% t= 12.50% 


— the fact that a distribution is proper is modeled by the probabilistic assertion 
£2 Pr[T] =1; 


The term Pr[T]”~‘! is necessary since we work with sub-distributions. 
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— a state expression é distributed according to a law g is modeled by the prob- 
abilistic assertion 


é~g Sw, Pr{é = w] = EfEvvgllv-u]]. 


The inner expectation computes the probability that v drawn from g is equal 
to a fixed w; the outer expectation weights the inner probability by the prob- 
ability of each value of w. 


We can easily define L operator from the previous section in our new syntax: 
@ = Prio] = 0. 

Syntactic Proof Rules. Now that we have a concrete syntax for assertions, we can 
give syntactic versions of many of the existing proof rules. Such proof rules are 
often easier to use since they avoid reasoning about the semantics of commands 
and assertions. We tackle the non-looping rules first, beginning with the following 
syntactic rules for assignment and sampling: 


ASSGN] 


SAMPLE] 


Pe = ony | 


The rule for assignment is the usual rule from Hoare logic, replacing the program 
variable x by its corresponding expression e in the pre-condition. The replace- 
ment [x := e] is done recursively on the probabilistic assertion n; for instance 


tle = dje — efn} | 


for expectations, it is defined by E[é][z := e] = Elé[x := e]], where é[x := e] is 
the syntactic substitution. 

The rule for sampling uses probabilistic substitution operator P2(7), which 
replaces all occurrences of x in 7 by a new integration variable t and records 
that t is drawn from g; the operator is defined in Fig. 6. 


A Next, we turn to the loop rule. 
P3 (v) =v ; a . 

I RIZ Bate. geese. The side-conditions from Fig.3 are 
P3 (Ele) x a [Ee~glélx := tl] purely semantic, while in practice it 
P§(0(n)) F o(P3(m),..-, P8(m)) is more convenient to use a sufficient 
P2 (m d n2) = P2(m) > PZ (n2) condition in the Hoare logic. We 


give sufficient conditions for ensur- 
ing certain and almost-sure termi- 
nation in Fig.7; € is an integer- 

Fig. 6. Syntactic op. P (main cases) valued expression. The first side- 
condition CCTerm shows certain termination given a strictly decreasing vari- 
ant č that is bounded below, similar to how a decreasing variant shows ter- 
mination for deterministic programs. The second side-condition CASTerm shows 
almost-sure termination given a probabilistic variant ê, which must be bounded 
both above and below. While č may increase with some probability, it must 
decrease with strictly positive probability. This condition was previously consid- 
ered by [17] for probabilistic transition systems and also used in expectation- 
based approaches [20,33]. Our framework can also support more refined condi- 
tions (e.g., based on super-martingales [9,31]), but the condition CAsSTerm already 
suffices for most randomized algorithms. 


for o € Ops, me {A,V, >}. 
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Coterm = {LAD(E=KAN<kA}|)} s {LADLE < k)} 
En => (Sy. Dë < y) AD(E = 0 > =b) 


Casterm = {LAU(E=kA^A0< k< KAD} 8s {LADO <E< K)APriE < k] > e} 
En>O(0<é€< K ^ēč=0 > >b) 
= tclosed (7) 


Fig. 7. Side-conditions for loop rules 


While t-closedness is a semantic condition (cf. Definition 10), there are sim- 
ple syntactic conditions to guarantee it. For instance, assertions that carry a 
non-strict comparison œX €{<, >, =} between two bounded probabilistic expres- 
sions are t-closed; the assertion stating probabilistic independence of a set of 
expressions is t-closed. 


Precondition Calculus. With a concrete syntax for assertions, we are also able 
to incorporate syntactic reasoning principles. One classic tool is Morgan and 
Mclver’s greatest pre-expectation, which we take as inspiration for a pre-condition 
calculus for the loop-free fragment of ELLORA. Given an assertion 7 and a loop- 
free statement s, we mechanically construct an assertion 7* that is the pre- 
condition of s that implies 7 as a post-condition. The basic idea is to replace 
each expectation expression p inside 7 by an expression p* that has the same 
denotation before running s as p after running s. This process yields an assertion 
7* that, interpreted before running s, is logically equivalent to 7 interpreted after 
running s. 

The computation rules for pre-conditions are defined in Fig.8. For a prob- 
ability assertion 7, its pre-condition pc(s,7) corresponds to 7 where the expec- 
tation expressions of the form E[é] are replaced by their corresponding pre- 
term, pe(s, E[é]). Pre-terms correspond loosely to Morgan and Mclver’s pre- 
expectations—we will make this correspondence more precise in the next section. 
The main interesting cases for computing pre-terms are for random sampling and 
conditionals. For random sampling the result is P2(E[é]), which corresponds to 
the [SAMPLE] rule. For conditionals, the expectation expression is split into a 
part where e is true and a part where e is not true. We restrict the expectation 
to a part satisfying e with the operator E{ée],, = E[ē - 1e]. This corresponds to 
the expected value of č on the portion of the distribution where e is true. Then, 
we can build the pre-condition calculus into ELLORA. 


Theorem 1. Lets be a non-looping command. Then, the following rule is deriv- 
able in the concrete version of ELLORA: 


(rele, mpatny | 


6 Case Studies: Embedding Lightweight Logics 


While ELLORA is suitable for general-purpose reasoning about probabilis- 
tic programs, in practice humans typically use more special-purpose proof 
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pe(si; s2, E[é]) = pe(si, pe(s2, E[é])) 
pe(x + e, E[é]) = E[él[x := e] 
pe(x & g, E[é]) £ Pz (E[ē]) 
pe(if e then s; else s2, E[é]) 7 pe(si, E[ë]) e + pe(s2, E |ë E)) ne 


A 
pe(s, pı XI p2) = pe(s, p1) & pe(s, p2) 


Fig. 8. Precondition calculus (selected) 


techniques—often targeting just a single, specific kind of property, like prob- 
abilistic independence—when proving probabilistic assertions. When these tech- 
niques apply, they can be a convenient and powerful tool. 

To capture this intuitive style of reasoning, researchers have considered 
lightweight program logics where the assertions and proof rules are tailored 
to a specific proof technique. We demonstrate how to integrate these tools in 
an assertion-based logic by introducing and embedding a new logic for reason- 
ing about independence and distribution laws, useful properties when analyzing 
randomized algorithms. We crucially rely on the rich assertions in ELLORA— 
it is not clear how to extend expectation-based approaches to support similar, 
lightweight reasoning. Then, we show to embed the union bound logic [4] for 
proving accuracy bounds. 


6.1 Law and Independence Logic 


We begin by describing the law and independence logic IL, a proof system with 
intuitive rules that are easy to apply and amenable to automation. For simplicity, 
we only consider programs which sample from the binomial distribution, and 
have deterministic control flow—for lack of space, we also omit procedure calls. 


Definition 12 (Assertions). IL assertions have the grammar: 
€:=det(e) | #E | e ~ B(e,p) |T| L| EAS 
where e € E, E C E, and p € [0,1]. 


The assertion det(e) states that e is deterministic in the current distribution, 
i.e., there is at most one element in the support of its interpretation. The asser- 
tion #E states that the expressions in E are independent, as formalized in the 
previous section. The assertion e ~ B(m, p) states that e is distributed according 
to a binomial distribution with parameter m (where m can be an expression) 
and constant probability p, i.e. the probability that e = k is equal to the proba- 
bility that exactly k independent coin flips return heads using a biased coin that 
returns heads with probability p. 
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Assertions can be seen as an instance of a logical abstract domain, where 
the order between assertions is given by implication based on a small number of 
axioms. Examples of such axioms include independence of singletons, irreflexivity 
of independence, anti-monotonicity of independence, an axiom for the sum of 
binomial distributions, and rules for deterministic expressions: 


#{x} #{x,c} <= > det(z) #(EU EF) = #E 


e~ B(m,p) Ae’ ~ B(m’,p) A# {ee} => e+e’ ~B(m+m’,p) 


\ det(e;) ==> det(f(e1,...,en)) 


l<i<n 


Definition 13. Judgments of the logic are of the form {E} s {E}, where £ 
and &' are IL-assertions. A judgment is valid if it is derivable from the rules of 
Fig. 9; structural rules and rule for sequential composition are similar to those 
from Sect. 4 and omitted. 


The rule [IL-AssGn] for deterministic assignments is as in Sect. 4. The rule 
[IL-SAMPLE] for random assignments yields as post-condition that the variable 
x and a set of expressions E are independent assuming that E is independent 
before the sampling, and moreover that x follows the law of the distribution that 
it is sampled from. The rule [IL-CoNnD] for conditionals requires that the guard 
is deterministic, and that each of the branches satisfies the specification; if the 
guard is not deterministic, there are simple examples where the rule is not sound. 
The rule [[L-WHILE] for loops requires that the loop is certainly terminating 
with a deterministic guard. Note that the requirement of certain termination 
could be avoided by restricting the structural rules such that a statement s has 
deterministic control flow whenever {€} s {&’} is derivable. 

We now turn to the embedding. The embedding of IL assertions into general 
assertions is immediate, except for det(e) which is translated as Oe V D-e. We 
let € denote the translation of €. 


Theorem 2 (Embedding and soundness of IL logic). If {€} s {&'} is derivable 
in the IL logic, then {£} s {&'} is derivable in (the syntactic variant of) ELLORA. 
As a consequence, every derivable judgment {E} s {&'} is valid. 


Proof sketch. By induction on the derivation. The interesting cases are condi- 
tionals and loops. For conditionals, the soundness follows from the soundness of 


the rule: , , 
{n}sı {7} {n} s2{n'} e V Dre 
{n} if e then sı else so {7} 


To prove the soundness of this rule, we proceed by case analysis on Ue V Lie. 
We treat the case e; the other case is similar. In this case, 7 is equivalent to 
nı A De ® n2 A Oe, where nı = 7 and ng = L. Let n) = 7 and n2 = OL; again, 
ni ® m is logically equivalent to 7’. The soundness of the rule thus follows from 
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{éz = el} cee {E} [IL-Asscn] 
{z} NFV(E) NFV(e) =0 
{# E} z & Ble,p) {#(EU {2}) Ax ~ Ble, p)} 


{é} s1 {E} {E} s2 {E"} 
{E} s1;52 {€"} 


{E} sa {E} {E} so {E} 
€ => det(b) 


{E} if b then sı else s2 {€'} 


[IL-SAMPLE] 


[IL-SEQ] 


[IL-Conp] 


{é} s {E} € => det(d) 
{E} whilebdos {£} 


C 
CTerm [tL WutLe] 


Fig. 9. IL proof rules (selected) 


the soundness of the [COND] and [CONSEQq] rules. For loops, there exists a natural 
number n such that while b do s is semantically equivalent to (if b then s)”. 
By assumption {€} s {€} holds, and thus by induction hypothesis {€} s {€}. We 
also have € => det(b), and hence {€} if b then s {€}. We conclude by [SEQ]. 


To illustrate our system IL, consider the statement s in Fig. 10 which flips 
a fair coin N times and counts the number of heads. Using the logic, we prove 
that c ~ B(N - (N + 1)/2,1/2) is a post-condition for s. We take the invariant: 


c ~ B (j(j + 1)/2, 1/2) 
The invariant holds initially, as 0 ~ B(0, 1/2). For the inductive case, we show: 
{e ~ B (0,1/2)} so {c ~ B((3 + 1G + 2)/2,1/2)} 


where so represents the loop body, i.e. x & B (j, 1/2) ;c — c +x. First, we apply 
the rule for sequence taking as intermediate assertion 


e ~ BGG + 1)/2,1/2) A x ~ B (5,1/2) A #{x,0} 


proc sum () = The first premise follows from the rule for random 
il a Aani assignment and structural rules. The second premise fol- 
for j <1 to N do lows from the rule for deterministic assignment and the 
x & B(j,1/2); rule of consequence, applying axioms about sums of bino- 
cec +x; ý 2 ` . 
un mial distributions. 


We briefly comment on several limitations of IL. First, 

Fig. 10. Sum of bin. IL is restricted to programs with deterministic control 
flow, but this restriction could be partially relaxed by 

enriching IL with assertions for conditional independence. Such assertions are 
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already expressible in the logic of ELLORA; adding conditional independence 
would significantly broaden the scope of the IL proof system and open the pos- 
sibility to rely on axiomatizations of conditional independence (e.g., based on 
graphoids [36]). Second, the logic only supports sampling from binomial distribu- 
tions. It is possible to enrich the language of assertions with clauses c ~ g where 
g can model other distributions, like the uniform distribution or the Laplace 
distribution. The main design challenge is finding a core set of useful facts about 
these distributions. Enriching the logic and automating the analysis are inter- 
esting avenues for further work. 


6.2 Embedding the Union Bound Logic 


The program logic AHL [4] was recently introduced for estimating accuracy of 
randomized computations. One main application of AHL is proving accuracy of 
randomized algorithms, both in the offline and online settings—i.e. with adver- 
sary calls. AHL is based on the union bound, a basic tool from probability theory, 
and has judgments of the form Fg {8} s {W}, where s is a statement, ® and 
W are first-order formulae over program variables, and ĝ is a probability, i.e. 
B € [0,1]. A judgment Fg {8} s {W} is valid if for every memory m such that 
&(m), the probability of -W in [s]m is upper bounded by Ø, i.e. Prysy,,,[7Y] < 2. 

Figure 11 presents some key rules of AHL, including a rule for sampling from 
the Laplace distribution £, centered around e. The predicate Coterm(k) indicates 
that the loop terminates in at most k steps on any memory that satisfies the 
pre-condition. Moreover, 8 is a function of e. 


I [AHL-SAMPLE| 


a 
Fe, {8} sı {O} =p {9} s2 {Y} 
61462 {8} S1; 52 {W} 


Hg {8} c {8} COTerm (*) 
Ex.g {8} while e doc {8 ^ ~e} 


Hs {T} z & Lele) {le — el < = log 


[AHL-SEQ] 


[AHL-WHiLE| 
Fig. 11. AHL proof rules (selected) 


AHL has a simple embedding into ELLORA. 


Theorem 3 (Embedding of AHL). If Fg {8} s {W} is derivable in AHL, then 
{OG} s{E|1-y] < 8} is derivable in ELLORA. 


7 Case Studies: Verifying Randomized Algorithms 


In this section, we will demonstrate ELLORA on a selection of examples; we 
present further examples in the supplemental material. Together, they exhibit 
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a wide variety of different proof techniques and reasoning principles which are 
available in the ELLORA’s implementation. 


Hypercube Routing. will begin with the hypercube routing algorithm [41,42]. Con- 
sider a network topology (the hypercube) where each node is labeled by a bit- 
string of length D and two nodes are connected by an edge if and only if the two 
corresponding labels differ in exactly one bit position. 

In the network, there is initially one packet at each node, and each packet 
has a unique destination. The algorithm implements a routing strategy based 
on bit fixing: if the current position has bitstring i, and the target node has 
bitstring j, we compare the bits in i and 7 from left to right, moving along the 
edge that corrects the first differing bit. Valiant’s algorithm uses randomization 
to guarantee that the total number of steps grows logarithmically in the number 
of packets. In the first phase, each packet i select an intermediate destination 
p(t) uniformly at random, and use bit fixing to reach p(i). In the second phase, 
each packet use bit fixing to go from p(i) to the destination j. We will focus on 
the first phase since the reasoning for the second phase is nearly identical. We 
can model the strategy with the code in Fig. 12, using some syntactic sugar for 
the for loops.° 


proc route (D T : int) : We assume that initially, the posi- 
Var pr. gl ay i Node mapi tion of the packet 7 is at node i (see 
var nextE : edge; a A 
pos < Map.init id 2P; p 4Map.empty; Map. init). Then, we initialize the ran- 
for i + 1 to 2P do dom intermediate destinations p. The 
pli] [1,2?] remaining loop encodes the evaluation 
for t+ 1 to T do . : 3 ES : 
üsedBy <— Map-ematy: of the routing strategy iter ated T time. 
for i + 1 to 2P do The variable usedBy is a map that 
if posti] Æp[t] then logs if an edge is already used by a 
nextE +— getEdge pos[i] p[i]; “a Scere 
if usedBy[nextE] = L then packet, it is empty at the beginning of 
7 (Mark: edge used: each iteration. For each packet, we try 
usedBy[nextE] + 12; ý 
// Move packet to move it across one edge along the 
pos[i] + dest nextE path to its intermediate destination. 


return (pos, p) 


The function getEdge returns the next 
edge to follow, following the bit-fixing 
scheme. If the packet can progress (its 
edge is not used), then its current position is updated and the edge is marked 
as used. 

We show that if the number of timesteps T is 4D + 1, then all packets reach 
their intermediate destination in at most T steps, except with a small probability 
272D of failure. That is, the number of timesteps grows linearly in D, logarithmic 
in the number of packets. This is formalized in our system as: 


Fig. 12. Hypercube Routing 


{T =4D + 1}route{Pr|3i. posfi] Æ pli] < J= any 


5 Recall that the number of node in a hypercube of dimension D is 2” so each node 
can be identified by a number in [1,2]. 
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proc coupon (N : int) : 
var int cp[N], t[N]; 
var int X <0; 
for p + 1 to N do 
ct ¢+ 0; 
cur È [1, N]; 
while cp[cur] = 1 do 
ct + ct +1; 
cur È [1, N]; 
t[p] «+ ct; 
cp[cur] = 1; 
X &xX + tipl; 
return X 


Modeling Infinite Processes. Our second example is 
the coupon collector process. The algorithm draws 
a uniformly random coupon (we have N coupon) on 
each day, terminating when it has drawn at least one 
of each kind of coupon. The code of the algorithm 
is displayed in Fig. 13; the array cp records of the 
coupons seen so far, t holds the number of steps 
taken before seeing a new coupon, and X tracks of 
the total number of steps. Our goal is to bound the 
average number of iterations. This is formalized in 
our logic as: 


[IX] = Vietiay (m) i 


Fig. 13. Coupon collector 


{L} coupon { 


S 


Limited Randomness. Pairwise independence says proc pwInd (N : int) : 


that if we see the result of X;, we do not gain infor- 
mation about all other variables X;,. However, if we 
see the result of two variables X;, Xj, we may gain 
information about X;. There are many construc- 


var boot X[2], BIN]; 
for i+ 1 to N do 


B[i] & Ber(1/2); 


for j < 1 to 2" do 
X[j] + 0; 


for k + 1 to N do 
if k € bits(j) then 
XIj]l < X[j] ® BIk] 
return X 


tions in the algorithms literature that grow a small 
number of independent bits into more pairwise inde- 
pendent bits. Figure 14 gives one procedure, where 
® is exclusive-or, and bits (j) is the set of positions 
set to 1 in the binary expansion of j. The proof uses 
the following fact, which we fully verify: for a uni- 
formly distributed Boolean random variable Y, and a random variable Z of any 


type, 


Fig. 14. Pairwise Indepen- 
dence 


Y#Z=>Y 6 f(Z) #9(Z) (1) 
for any two Boolean functions f, g. Then, note that x[i] = By jepits(i)} BIJ] where 
the big XOR. operator ranges over the indices j where the bit representation of 
i has bit j set. For any two i,k € [1,..., 2%] distinct, there is a bit position in 
(1,...,N] where ¿i and k differ; call this position r and suppose it is set in 7 but 


not in k. By rewriting, 
p p 


{jebits(i)\r} {jebits(k)\r} 


a[j] and x(k] = ali) 

Since B[j] are all independent, xļi] # x[k] follows from Eq. (1) taking Z to be 
the distribution on tuples (B[1],...,B[N]) excluding 8[r]. This verifies pairwise 
independence: 


{L} pwindim) {LA Vi, k € [2]. i Ak => zli] # x[k]}. 
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Adversarial Programs. Pseudorandom functions (PRF) and pseudorandom per- 
mutations (PRP) are two idealized primitives that play a central role in the 
design of symmetric-key systems. Although the most natural assumption to make 
about a blockcipher is that it behaves as a pseudorandom permutation, most 
commonly the security of such a system is analyzed by replacing the blockci- 
pher with a perfectly random function. The PRP/PRF Switching Lemma [6,22] 
fills the gap: given a bound for the security of a blockcipher as a pseudorandom 
function, it gives a bound for its security as a pseudorandom permutation. 


Lemma 4 (PRP/PRF switching lemma). Let A be an adversary with blackbox 
access to an oracle O implementing either a random permutation on {0,1}! or a 
random function from {0,1}! to {0,1}'. Then the probability that the adversary 
A distinguishes between the two oracles in at most q calls is bounded by 


(q—-1) 


q 
[b A |H] < q] [BA |H| < dll < =r 


| Pr — Pr 
PRP PRF 
where H is a map storing each adversary call and |H| is its size. 


Proving this lemma can be done using the Fundamental Lemma of Game- 
Playing, and bounding the probability of bad in the program from Fig. 15. We 
focus on the latter. Here we apply the [ADv] rule of ELLORA with the invariant 
Vk, PribaaA|H| < k] < EOD where |H] is the size of the map H, i.e. the number 
of adversary call. Intuitively, the invariant says that at each call to the oracle 
the probability that bad has been set before and that the number of adversary 
call is less than k is bounded by a polynomial in k. 

The invariant is d-closed and true before the adversary call, since at that 
point Pr[baa] = 0. Then we need to prove that the oracle preserves the invariant, 
which can be done easily using the precondition calculus ([PC] rule). 


var H: ({0,1}!', {0,1}") map; 


proc orcl (q:{0,1}!): proc main(): 
var a: {0,1}!; var b: bool; 
if q é H then bad + false; 
a & {0,1}; Pin, 
bad + bad || a € codom(H); me 
return b; 
H [q] <«; 
return H [q]; 


Fig. 15. PRP/PRF game 


8 Implementation and Mechanization 


We have built a prototype implementation of ELLORA within EASYCRYPT [2,5], 
a theorem prover originally designed for verifying cryptographic protocols. EASY- 
CRYPT provides a convenient environment for constructing proofs in various 
Hoare logics, supporting interactive, tactic-based proofs for manipulating asser- 
tions and allowing users to invoke external tools, like SMT-solvers, to discharge 
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proof obligations. EASYCRYPT provides a mature set of libraries for both data 
structures (sets, maps, lists, arrays, etc.) and mathematical theorems (algebra, 
real analysis, etc.), which we extended with theorems from probability theory. 
Table 1. Benchmarks We used the implementation for verifying 
many examples from the literature, including 


Example LC [FPLC aj) the programs presented in Sect.7 as well 
hypercube 100 |1140 as some additional examples in Table 1 (such 
coupon 27 | 184 as polynomial identity test, private running 
vyertex-cover 30 | 61 sums, properties about random walks, etc.). 


The verified proofs bear a strong resemblance 


pairwise-indep 30 | 231 oe 
to the existing, paper proofs. Independently 


eee aS of this work, ELLORA has been used to for- 
Baty =e eee 22 | 32 ynalize the main theorem about a randomized 
random-walk 16 | 42 — gossip-based protocol for distributed systems 
dice-sampling 10| 64  [26, Theorem 2.1]. Some libraries developed in 


matrix-prod-test| 20 75 the scope of ELLORA have been incorporated 
into the main branch of EASYCRYPT, includ- 
ing a general library on probabilistic independence. 


A New Library for Probabilistic Independence. In order to support assertions of 
the concrete program logic, we enhanced the standard libraries of EASYCRYPT, 
notably the ones dealing with big operators and sub-distributions. Like all EAsy- 
CRYPT libraries, they are written in a foundational style, i.e. they are defined 
instead of axiomatized. A large part of our libraries are proved formally from first 
principles. However, some results, such as concentration bounds, are currently 
declared as axioms. 

Our formalization of probabilistic independence deserves special mention. We 
formalized two different (but logically equivalent) notions of independence. The 
first is in terms of products of probabilities, and is based on heterogenous lists. 
Since ELLORA (like EASYCRYPT) has no support for heterogeneous lists, we 
use a smart encoding based on second-order predicates. The second definition 
is more abstract, in terms of product and marginal distributions. While the 
first definition is easier to use when reasoning about randomized algorithms, the 
second definition is more suited for proving mathematical facts. We prove the 
two definitions equivalent, and formalize a collection of related theorems. 


Mechanized Meta-Theory. The proofs of soundness and relative completeness 
of the abstract logic, without adversary calls, and the syntactical termination 
arguments have been mechanized in the Coq proof assistant. The development 
is available in supplemental material. 


9 Related Work 


More on Assertion-Based Techniques. The earliest assertion-based system is due 
to Ramshaw [37], who proposes a program logic where assertions can be formulas 
involving frequencies, essentially probabilities on sub-distributions. Ramshaw’s 
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logic allows assertions to be combined with operators like ©, similar to our 
approach. [18] presents a Hoare-style logic with general assertions on the distri- 
bution, allowing expected values and probabilities. However, his while rule is 
based on a semantic condition on the guarded loop body, which is less desirable 
for verification because it requires reasoning about the semantics of programs. 
[8] give decidability results for a probabilistic Hoare logic without while loops. 
We are not aware of any existing system that supports assertions about general 
expected values; existing works also restrict to Boolean distributions. [38] formal- 
ize a Hoare logic for probabilistic programs but unlike our work, their assertions 
are interpreted on distributions rather than sub-distributions. For conditionals, 
their semantics rescales the distribution of states that enter each branch. How- 
ever, their assertion language is limited and they impose strong restrictions on 
loops. 


Other Approaches. Researchers have proposed many other approaches to verify 
probabilistic program. For instance, verification of Markov transition systems 
goes back to at least [17,40]; our condition for ensuring almost-sure termination 
in loops is directly inspired by their work. Automated methods include model 
checking (see e.g., [1,25,29]) and abstract interpretation (see e.g., [12,32]). Tech- 
niques for reasoning about higher-order (functional) probabilistic languages are 
an active subject of research (see e.g., [7,13,14]). For analyzing probabilistic 
loops, in particular, there are tools for reasoning about running time. There are 
also automated systems for synthesizing invariants [3,11]. [9,10] use a martin- 
gale method to compute the expected time of the coupon collector process for 
N = 5—fixing N lets them focus on a program where the outer while loop 
is fully unrolled. Martingales are also used by [15] for analyzing probabilistic 
termination. Finally, there are approaches involving symbolic execution; [39] use 
a mix of static and dynamic analysis to check probabilistic programs from the 
approximate computing literature. 


10 Conclusion and Perspectives 


We introduced an expressive program logic for probabilistic programs, and 
showed that assertion-based systems are suited for practical verification of prob- 
abilistic programs. Owing to their richer assertions, program logics are a more 
suitable foundation for specialized reasoning principles than expectation-based 
systems. As evidence, our program logic can be smoothly extended with cus- 
tom reasoning for probabilistic independence and union bounds. Future work 
includes proving better accuracy bounds for differentially private algorithms, 
and exploring further integration of ELLORA into EASYCRYPT. 
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Abstract. Probabilistic programming is an emerging technique for 
modeling processes involving uncertainty. Thus, it is important to ensure 
these programs are assigned precise formal semantics that also cleanly 
handle typical exceptions such as non-termination or division by zero. 
However, existing semantics of probabilistic programs do not fully accom- 
modate different exceptions and their interaction, often ignoring some or 
conflating multiple ones into a single exception state, making it impos- 
sible to distinguish exceptions or to study their interaction. 

In this paper, we provide an expressive probabilistic programming 
language together with a fine-grained measure-theoretic denotational 
semantics that handles and distinguishes non-termination, observation 
failures and error states. We then investigate the properties of this seman- 
tics, focusing on the interaction of different kinds of exceptions. Our work 
helps to better understand the intricacies of probabilistic programs and 
ensures their behavior matches the intended semantics. 


1 Introduction 


A probabilistic programming language allows probabilistic models to be speci- 
fied independently of the particular inference algorithms that make predictions 
using the model. Probabilistic programs are formed using standard language 
primitives as well as constructs for drawing random values and conditioning. 
The overall approach is general and applicable to many different settings (e.g., 
building cognitive models). In recent years, the interest in probabilistic pro- 
gramming systems has grown rapidly with various languages and probabilistic 
inference algorithms (ranging from approximate to exact). Examples include 
[10,11,13,14,25-27,29,36]; for a recent survey, please see [15]. An important 
branch of recent probabilistic programming research is concerned with provid- 
ing a suitable semantics for these programs enabling one to formally reason about 
the program’s behaviors [2—4, 33-35]. 

Often, probabilistic programs require access to primitives that may result 
in unwanted behavior. For example, the standard deviation o of a Gaussian 
distribution must be positive (sampling from a Gaussian distribution with neg- 
ative standard deviation should result in an error). If a program samples from 
a Gaussian distribution with a non-constant standard deviation, it is in general 
© The Author(s) 2018 
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undecidable if that standard deviation is guaranteed to be positive. A similar sit- 
uation occurs for while loops: except in some trivial cases, it is hard to decide if 
a program terminates with probability one (even harder than checking termina- 
tion of deterministic programs [20]). However, general while loops are important 
for many probabilistic programs. As an example, a Markov Chain Monte Carlo 
sampler is essentially a special probabilistic program, which in practice requires 
a non-trivial stopping criterion (see e.g. [6] for such a stopping criterion). In 
addition to offering primitives that may result in such unwanted behavior, many 
probabilistic programming languages also provide an observe primitive that intu- 
itively allows to filter out executions violating some constraint. 


Motivation. Measure-theoretic denotational semantics for probabilistic programs 
is desirable as it enables reasoning about probabilistic programs within the rig- 
orous and general framework of measure theory. While existing research has 
made substantial progress towards a rigorous semantic foundation of proba- 
bilistic programming, existing denotational semantics based on measure theory 
usually conflate failing observe statements (i.e., conditioning), error states and 
non-termination, often modeling at least some of these as missing weight in a 
sub-probability measure (we show why this is practically problematic in later 
examples). This means that even semantically, it is impossible to distinguish 
these types of exceptions!. However, distinguishing exceptions is essential for a 
solid understanding of probabilistic programs: it is insufficient if the semantics 
of a probabilistic programming language can only express that something went 
wrong during the execution of the program, lacking the capability to distin- 
guish for example non-termination and errors. Concretely, programmers often 
want to avoid non-termination and assertion failure, while observation failure 
is acceptable (or even desirable). When a program runs into an exception, the 
programmer should be able determine the type of exception, from the semantics. 


This Work. This paper presents a clean denotational semantics for a Turing 
complete first-order probabilistic programming language that supports mixing 
continuous and discrete distributions, arrays, observations, partial functions and 
loops. This semantics distinguishes observation failures, error states and non- 
termination by tracking them as explicit program states. Our semantics allows 
for fine-grained reasoning, such as determining the termination probability of a 
probabilistic program making observations from a sequence of concrete values. 
In addition, we explain the consequences of our treatment of exceptions by 
providing interesting examples and properties of our semantics, such as commu- 
tativity in the absence of exceptions, or associativity regardless of the presence of 
exceptions. We also investigate the interaction between exceptions and the score 
primitive, concluding in particular that the probability of non-termination can- 
not be defined in this case. score intuitively allows to increase or decrease the 
probability of specific runs of a program (for more details, see Sect. 5.3). 


1 Tn this paper, we refer to errors, non-termination and observation failures collectively 
as exceptions. For example, a division by zero is an error (and hence and exception), 
while non-termination is an exception but not an error. 
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2 Overview 


In this section we demonstrate several important features of our probabilistic 
programming language (PPL) using examples, followed by a discussion involving 
different kinds of exception interactions. 


2.1 Features of Probabilistic Programs 


In the following, we informally discuss the most important features of our PPL. 


Discrete and Continuous Primitive Distributions. List- 
ing 1 illustrates a simple Gaussian mixture model (the are 

: : if flip(s) { 
figure only shows the function body). Depending on the yegauss(0, 132 
outcome of a fair coin flip x (resulting in 0 or 1), y iS jeigex 
sampled from a Gaussian distribution with mean 0 or — y-gauss(2,1); 
mean 2 (and standard deviation 1). Note that in our } 
PPL, we represent gauss(-,-) by the more general construct return y; 
sampleFrom;(-,-), with f : R x [0, 00) + R—R being the Listing 1. Simple 
probability density function of the Gaussian distribution Gaussian mixture 


(w=)? 


f(u, o)(2) = Jas aa 


y:=0; 


Conditioning. Listing 2 samples two independent values x:=uniform(0,1); 
from the uniform distribution on the interval [0,1] and  y:=uniform(0,1) ; 
conditions the possible values of xz and y on the obser- observe(xty>1); 
vation «+ y > 1 before returning zx. Intuitively, the first return x; 
two lines express a-priori knowledge about the uncertain Listing 2. Condition- 
values of x and y. Then, a measurement determines that ing on a continuous dis- 
x+y is greater than 1. We combine this new information tribution 
with the existing knowledge. Because x+y > 1 is more likely for larger values of 
x, the return value has larger weight on larger values. Formally, our semantics 
handles observe by introducing an extra program state for observation failure 
4. Hence, the probability distribution after the third line of Listing 2 will put 
weight $ on 4 and weight $ on those x and y satisfying xz + y > 1. 

In practice, one will usually condition the output distribution on there being 
no observation failure (4). For discrete distributions, this amounts to computing: 


Prix ži 1—PrixX=J] 


Pivan ep a ee 


where x is the outcome of the program (a value, non-termination or an error) 
and Pr|X = a] is the probability that the program results in x. Of course, this 
conditioning only works when the probability of 4 is not 1. Note that tracking 
the probability of 4 has the practical benefit of rendering the (often expensive) 
marginalization Pr[X # 4] = $2,4; Pr[X = 2] unnecessary. 

Other semantics often use sub-probability measures to express failed obser- 
vations [4,34,35]. These semantics would say that Listing 2 results in a return 
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value between 0 and 1 with probability i (and infer that the missing weight 
of ł is due to failed observations). We believe one should improve upon this 
approach as the semantics only implicitly states that the program sometimes 
fails an observation. Further, this strategy only allows tracking a single kind of 
exception (in this case, failed observations). This has led some works to conflate 
observation failure and non-termination [18,34]. We believe there is an impor- 
tant distinction between the two: observation failure means that the program 
behavior is inconsistent with observed facts, non-termination means that the 
program did not return a result. 
Listing 3 illustrates that it is not possible to con- if flip(4) { 


dition parts of the program on there being no obser- x:=0; 

vation failure. In Listing 3, conditioning the first observe(flip(4)) ; 
branch x := 0;observe(flip($)) on there being no telse{ 

observation failure yields Pr[x = 0] = 1, rendering = ***1! i 
the observation irrelevant. The same situation arises _ °bserve(flip(7)); 


for the second branch. Hence, conditioning the two y 


branches in isolation yields Pr[z = 0] = 4 instead of Listing 3. The need for 
Prix = 0) = A tracking 4 


Loops. Listing 4 shows a probabilistic program with a n:=0; 
while loop. It samples from the geometric($) distribu- while !flip(3) { 
tion, which counts the number of failures (flip returns n=n+1; 
0) until the first success occurs (flip returns 1). This +} 
program terminates with probability 1, but it is of course return n; 
possible that a probabilistic program fails to terminate Listing 4. Geometric 
with positive probability. Listing 5 demonstrates this pos- distribution 
sibility. 

Listing 5 modifies x until either x = 0 or x = 10. x := 5; 
In each iteration, x is either increased or decreased, while x>0 { 
each with probability H, If x reaches 0, the loop ter- if x<10 { ; 
minates. If x reaches 10, the loop never terminates. xt=2*flip(5)-1; 
By symmetry, both termination and non-termination } 
are equally likely. Hence, the program either returns } 
0 or does not terminate, each with probability 5. pene it 

Other semantics often use sub-probability mea- 
sures to express non-termination [4,23]. Thus, these 
semantics would say that Listing 5 results in 0 with probability $ (and nothing 
else). We propose to track the probability of non-termination explicitly by an 
additional state ©, just as we track the probability of observation failure (4). 


Listing 5. Program that 
may not terminate 


Partial Functions. Many functions that are practically x:=uniform(-1,1); 
useful are only partial (meaning they are not defined for x=/x; 

some inputs). Examples include uniform(a, b) (undefined return x; 

for b < a) and yz (undefined for x < 0). Listing 6 shows Listing 6. Using par- 
an example program using \/z. Usually, semantics do not tial functions 
explicitly address partial functions [23,24,28,33] or use 
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This work [23] [24] [35] [4,34] [28] 


Fig. 1. Visual comparison of the exception handling capabilities of different semantics. 
For example, © is filled in [34] because its semantics can handle non-termination. 
However, the intersection between © and 4 is not filled because [34] cannot distinguish 
non-termination from observation failure. 


partial functions without dealing with failure (e.g. [19] use Bernoulli(p) without 
stating what happens if p ¢ [0,1]). Most of these languages could use a sub- 
probability distribution that misses weight in the presence of errors (in these 
languages, this results in conflating errors with non-termination and observation 
failures). 

We introduce a third exception state L that can be produced when partial 
functions are evaluated outside of their domain. Thus, Listing 6 results in L 
with probability 5 and returns a value from [0,1] with probability 4 (larger val- 
ues are more likely). Some previous work uses an error state to capture failing 
computations, but does not propagate this failure implicitly [34,35]. In partic- 
ular, if an early expression in a long program may fail evaluating /—4, every 
expression in the program that depends on this failing computation has to check 
whether an exception has occurred. While it may seem possible to skip the 
rest of the function in case of a failing computation (by applying the pattern 
if (x = 1) {return L} else {rest of function}), this is non-modular and does 
not address the result of the function being used in other parts of a program. 

Although our semantics treat L and 4 similarly, there is an important distinc- 
tion between the two: | means the program terminated due to an error, while 
4 means that according to observed evidence, the program did not actually run. 


2.2 Interaction of Exception States 


Next, we illustrate the interaction of different exception states. We explain how 
our semantics handles these interactions when compared to existing semantics. 
Fig. 1 gives an overview of which existing semantics can handle which (interac- 
tions of) exceptions. We note that our semantics could easily distinguish more 
kinds of exceptions, such as division by zero or out of bounds accesses to arrays. 


Non-termination and Observation Failure. Listing 7 x:=0; 

shows a program that has been investigated in [22]. Based while x=0 { 

on the observations, it only admits a single behavior, x=flip(3); 

namely always sampling x = 0 in the third line. This  9bserve(x=0); 
behavior results in non-termination, but it occurs with } 

probability 0. Hence, the program fails an observation Listing 7. Mixing 
(ending up in state 4) with probability 1. If we try to loops and observations 
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condition on not failing any observation (by rescaling appropriately), this results 
in a division by 0, because the probability of not failing any observation is 0. 
The semantics of Listing 7 thus only has weight on 4, and does not allow 
conditioning on not failing any observation. This is also the solution that [22] 
proposes, but in our case, we can formally back up this claim with our semantics. 
Other languages handle both non-termination and observation failure by sub- 
probability distributions, which makes it impossible to conclude that the missing 
weight is due to observation failure (and not due to non-termination) [4, 24,34]. 
The semantics in [28] cannot directly express that the missing weight is due 
to observation failure (rather, the semantics are undefined due to a division by 
zero). However, the semantics enables a careful reader to determine that the 
missing weight is due to observation failure (by investigating the conditional 
weakest precondition and the conditional weakest liberal precondition). Some 
other languages can express neither while loops nor observations [23,33,35]. 


Assertions and Non-termination. For some programs, it assert(x>0); 
is useful to check assumptions explicitly. For example, assert(x=|x|); 
the implementation of the factorial function in Listing 8  fac:=1; 
explicitly checks whether x is a valid argument to the while x#0 { 


factorial function. If x ¢ N, the program should run into Tarstaces, 
an error (i.e. only have weight on L). If x € N, the pro- | *%}} 
gram should return z! (i.e. only have weight on z!). This ae ee. 


example illustrates that earlier exceptions (like failing 
an assertion) should bypass later exceptions (like non- 
termination, which occurs for « ¢ N if the programmer 
forgets the first two assertions). This is not surprising, given that this is also the 
semantics of exceptions in most deterministic languages. Most existing semantics 
either cannot express Listing 8 ([23,34] have no assertions, [35] has no iteration) 
or cannot distinguish failing an assertion from non-termination [24, 28,33]. The 
consequence of the latter is that removing the first two assertions from List- 
ing 8 does not affect the semantics. Handling assertion failure by sum types (as 
e.g. in [34]) could be a solution, but would force the programmer to deal with 
assertion failure explicitly. Only the semantics in [4] has the expressiveness to 
implicitly handle assertion errors in Listing 8 without conflating those errors 
with non-termination. 

Listing 9 shows a different interaction between non- x:=0; 
termination and failing assertions. Here, even though the while 1 { 
loop condition is always true, the first iteration of the loop X=X/X; 
will run into an exception. Thus, Listing 9 results in L } 
with probability 1. Again, this behavior should not be sur- Listing 9. Guaran- 
prising given the behavior of deterministic languages. For teed failure 
Listing 9, conflating errors with non-termination means the program semantics 
cannot express that the missing weight is due to an error and not due to non- 
termination. 


Listing 8. Explicitly 
checking assumptions 
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Observation Failure and Assertion Failure. In our PPL, observe(flip( tyy; 
earlier exceptions bypass later exceptions, as illustrated assert(flip(4)); 

in Listing 8. However, because we are operating in a Listing 10. Observa- 
probabilistic language, exceptions can occur probabilis- tion or assertion failure 
tically. Listing 10 shows a program that may run into 

an observation failure, or into an assertion failure, or neither. If it runs into an 
observation failure (with probability $), it bypasses the rest of the program, 
resulting in 4 with probability 5 and in L with probability L, Conditioning on 
the absence of observation failures, the probability of L is 5. 

An important observation is that reordering the two statements of Listing 10 
will result in a different behavior. This is the case, even though there is no obvious 
data-flow between the two statements. This is in sharp contrast to the semantics 
in [34], which guarantee (in the absence of exceptions) that only data flow is 
relevant and that expressions can be reordered. Our semantics illustrate that 
even if there is no explicit data-dependency, some seemingly obvious properties 
(like commutativity) may not hold in the presence of exceptions. Some languages 
either cannot express Listing 10 ([23,33] lack observations), cannot distinguish 
observation failure from assertion failure [24] or cannot handle exceptions implic- 
itly [34,35]. 


Summary. In this section, we showed examples of probabilistic programs that 
exhibit non-termination, observation failures and errors. Then, we provided 
examples that show how these exceptions can interact, and explained how exist- 
ing semantics handle these interactions. 


3 Preliminaries 


In this section, we provide the necessary theory. Most of the material is stan- 
dard, however, our treatment of exception states is interesting and important 
for providing semantics to probabilistic programs in the presence of exceptions. 
All key lemmas (together with additional definitions and examples) are proven 
in Appendix A. 


Natural Numbers, [n], Iverson Brackets, Restriction of Functions. We include 0 
in the natural numbers, so that N := {0,1,...}. For n € N, [n] := {1,...,n}. 
The Iverson brackets [-] are defined by [b] = 1 if b is true and [b] = 0 if b is false. 
A particular application of the Iverson brackets is to characterize the indicator 
function of a specific set S by [x € S]. For a function f: X — Y and a subset of 
the domain S C X, f restricted to S is denoted by fig: SY. 


Set of Variables, Generating Tuples, Preservation of Properties, Singleton Set. 
Let Vars be a set of admissible variable names. We refer to the elements of Vars by 
x,y,z and Ti, Yi, Zi, Vi, Wi, for i € N. For v E€ A and n EN, v!n := (v,...,v) E A” 
denotes the tuple containing n copies of v. A function f: A” — A preserves a 
property if whenever a1,...,@n € A have that property, f(a1,...,an) € A has 
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that property. Let 1 denote the set which only contains the empty tuple (), i.e. 
:= {()}. For sets of tuples S C [[;—; Aj, there is an isomorphism S x 1 ~ 
x S ~ S. This isomorphism is intuitive and we sometimes silently apply it. 


1 


Exception States, Lifting Functions to Exception States. We allow the extension 
of sets with some symbols that stand for the occurrence of special events in 
a program. This is important because it allows us to capture the event that a 
given program runs into specific exceptions. Let ¥ := {L, 4, ©} be a (countable) 
set of exception states. We denote by A := A U the set A extended with X 
(we require that AM ¥ = Ø). Intuitively, -L corresponds to assertion failures, 
4 corresponds to observation failures and © corresponds to non-termination. 
For a function f: A —> B, f lifted to exception states, denoted by f: A > B 
is defined by f(a) = a if a € X and f(a) = f(a) if a ¢ X. For a function 
f: TIL, 4: > B, f lifted to exception states, denoted by f: []j_, Ai > B, 
propagates the first exception in its arguments, or evaluates f if none of its 
arguments are exceptions. Formally, it is defined by f(a1,...,an) = a1 if a1 € X, 
f(a1,..-,@n) = ag if ay € X and ag € &X, and so on. Only if ay,...,an € X, 
we have f(a1,...,@n) = f(a1,...,@n). Thus, f(©,a, L) =O. In particular, we 
write (a,b) for lifting the tupling function, resulting in for example (4,©) = 4. To 
remove notation clutter, we do not distinguish the two different liftings f: A — B 
and f: [[j_, Ai — B notationally. Whenever we write f, it will be clear from 


the context which lifting we mean. We write SxT for {(s,t) | s€ S,t € T}. 


Records. A record is a special type of tuple indexed by variable names. For sets 
(Si)ie[n], a record r € Th. (i: S:) has the form r = {z1 > v1,..., En +> Un}, 
where v; € S;, with the convenient shorthand r = {x; +> u;}iefnj- We can access 
the elements of a record by their name: r[x;] = vi. 

In what follows, we provide the measure theoretic background necessary to 
express our semantics. 


o-algebra, Measurable Set, c-algebra Generated by a Set, Measurable Space, Mea- 
surable Functions. Let A be some set. A set X4 C P(A) is called a o-algebra 
on A if it satisfies three conditions: A € X4, X4 is closed under complements 
(S € Xa implies A\S € X4) and X4 is closed under countable unions (for any 
collection {S;}ien with S; € X4, we have Uien Si € Xa). The elements of XA 
are called measurable sets. For any set A, a trivial -algebra on A is its power set 
P (A). Unfortunately, the power set often contains sets that do not behave well. 
To come up with a o-algebra on A whose sets do behave well, we often start with 
a set S C P(A) that is not a o-algebra and extend it until we get a o-algebra. 
For this purpose, let A be some set and S C P(A) a collection of subsets of 
A. The o-algebra generated by S denoted by a(S) is the smallest o-algebra that 
contains S. Formally, o( S) is the intersection of all c-algebras on A containing 
S. For a set A and a o-algebra X4 on A, (A, X4) is called a measurable space. 
We often leave X4 implicit; whenever it is not mentioned explicitly, it is clear 
from the context. Table 1 provides the implicit o-algebras for some common sets. 
As an example, some elements of Xg include [0,1] U {1} and {1, 3,7}. For mea- 
surable spaces (A, X4) and (B, Xg), a function f: A — B is called measurable, 
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Table 1. Implicit o-algebras on common sets, for measurable spaces (A, Xa), (Ai, Xa, ) 


Set g-algebra on this set 

R Xr = B :=a({[a,b] C R | a < b,a € R,b € R}), the Borel 
g-algebra on R generated by all intervals 

Sfr SEB |Xs={TEB|TCS} 


Mi A; ATT, A; — 9 i Si | Si € Za} 
Miz (2: : 4) | Spesa = 0 {TDi (a: : Si) | Si E€ Xap) 
vi Sg = {SUS'| Se X4, S! €P(x)} 


if VS € Xg: f-1(S) € X4. Here, f~'(S) := {a € A: f(a) € S}. If one is familiar 
with the notion of Lebesgue measurable functions, note that our definition does 
not include all Lebesgue measurable functions. As a motivation to why we need 
measurable functions, consider the following scenario. We know the distribution 
of some variable x, and want to know the distribution of y = f(a). To figure out 
how likely it is that y € S for a measurable set S, we can determine how likely 
it is that x € f~1(S), because f—!(S) is guaranteed to be a measurable set. 


Measures, Examples of Measures. For a measurable space (A, X4), a function 
u: X4 > [0, œœ] is called a measure on A if it satisfies two properties: null empty 
set (44(0) = 0) and countable additivity (for any countable collection {S;}iezr of 
pairwise disjoint sets S; € XA, we have u (Uez Si) = Vier u(Si)). Measures 
allow us to quantify the probability that a certain result lies in a measurable set. 
For example, u([1,2]) can be interpreted as the probability that the outcome of 
a process is between 1 and 2. 

The Lebesgue measure A: B — [0,00] is the (unique) measure that satisfies 
A({a, b]) = b — a for all a,b € R with a < b. The zero measure 0: X4 — [0, oo] is 
defined by 0(S) = 0 for all S € X4. For a measurable space (A, X4) and some 
a € A, the Dirac measure ĝa: X4 — [0,00] is defined by 6,(') = [a € S]. 

Unfortunately, there are measures that do not satisfy some important proper- 
ties (for example, they may not satisfy Fubini’s theorem, which we discuss later 
on). The usual way to deal with this is to restrict our attention to o-finite mea- 
sures, which are well-known and were studied in great detail. However, o-finite 
measures are too restrictive for our purposes. In particular, the s-finite kernels 
that we introduce later on can induce measures that are not o-finite. This is why 
in the following, we work with s-finite measures. Table 2 gives an overview of the 
different kinds of measures that are important for understanding our work. The 
expression 1/2- 6, stands for the pointwise multiplication of the measure ô, by 
1/2: 1/2 - 6, = AS.1/2- 6,(S). Here, the A refers to A-abstraction and not to 
the Lebesgue measure. To distinguish the two As, we always write “Ax.” (with a 
dot) when we refer to A-abstraction. For more details on the definitions and for 
proofs about the provided examples, see Appendix A.1. 
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Table 2. Definition and comparison of different measures u: Xa — [0,00] on mea- 
surable spaces (A, X4). Reading the table top-down, we get from the most restrictive 
definition to the most permissive definition. For example, any sub-probability measure 
is also a o-finite measure. We also provide an example for each type of measure that 
is not an example of the more restrictive type of measure. For example, the Lebesgue 
measure À is o-finite but not s-finite. 


Type of measure Characterization Examples 
Probability measure u is a measure and (A) = 1 =ô 
Sub-probability measure | u is a measure and u(A) < 1 u=0 or p = 1/2. ô 
o-finite measure pis a measure and A = Uj Ai for w=A 
A; € Xa with p(A;) < co 
0 A(S)=0 
s-finite measure L = Jien Mi for sub-probability a(S) = { (5) 
measures [14 œ A(S) > 0 
S| S finit 
Measure (0) = 0, countable additivity a(S) = ISI i ° 
co otherwise 


Product of Measures, Product of Measures in the Presence of Exception States. 
For s-finite measures u: X4 — [0,00] and yp’: Xg — [0,co], we denote the 
product of measures by u x u: Yay — [0,00], and define it by 


(ux w’)(8) = I D I _ Lasd) € Su (db)ulaa) 


For s-finite measures u: Yq — [0,œ] and p: Ly — [0,00], we denote the 
lifted product of measures by uXp': zzz — [0,00] and define it using the 
lifted tupling function: (uxp’')(S) = fica Snepl(a b) € S]u'(db)u(da). While the 
product of measures u x u’ is well known for combining two measures to a joint 
measure, the concept of a lifted product of measures Xp’ is required to do 
the same for combining measures that have weight on exception states. Because 
the formal semantics of our probabilistic programming language makes use of 
exception states, we always use x to combine measures, appropriately handling 
exception states implicitly. 


Lemma 1. For measures u: Xa — [0,00], w: Xg — [0,00], let S € Xa and 
T € Xp. Then, (ux p’)(S x T) = w(S)- p(T). 


For u: Xz > [0, co], w: Xp > [0,00] and S € Xz, T € Xp, in general we 
have (uxu) (S x T) A a(S) - u’ (T), due to interactions of exception states. 


Lemma 2. x and X for s-finite measures are associative, left- and right-dis- 
tributive and preserve (sub-)probability and s-finite measures. 


Lebesgue Integrals, Fubini’s Theorem for s-finite Measures. Our definition of the 
Lebesgue integral is based on [31]. It allows integrating functions that sometimes 
evaluate to oo, and Lebesgue integrals evaluating to oo. 
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Here, (A, X4) and (B, Xp) are measurable spaces and p: X4 — [0,00] and 
W: Xg — [0,00] are measures on A and B, respectively. Also, E € X4 and 
F € Xp. Let s: A — (0,00) be a measurable function. s is a simple function if 
s(x) = J; aile € Aj] for A; € X4 and a; € R. For any simple function s, 
the Lebesgue integral of s over E with respect to u, denoted by Jier s(a)u(da), 
is defined by $7", a; - u(A; N E), making use of the convention 0 - oo = 0. Let 
f: A — [0,00] be measurable but not necessarily simple. Then, the Lebesgue 
integral of f over E with respect to yu is defined by 


n _ „Oulda) = sup i 1 Sena 


Here, the inequalities on functions are pointwise. Appendix A.2 lists some 
useful properties of the Lebesgue integral. Here, we only mention Fubini’s theo- 
rem, which is important because it entails a commutativity-like property of the 
product of measures: (u x p’)(S) = (u x u)(swap(S)), where swap switches the 
dimensions of S: swap(S) = {(b,a) | (a,b) € S}. The proof of this property 
is straightforward, by expanding the definition of the product of measures and 
applying Fubini’s theorem. As we show in Sect. 5, this property is crucial for the 
commutativity of expressions. In the presence of exceptions, it does not hold: 


(uxp')(S) # (u"x)(swap(S)) in general. 


s : A —> (0,00) is simpl.0-<+< j} 


Theorem 1 (Fubini’s theorem). For s-finite measures u: X4 — [0,00] and 
L': Sp — [0,00] and any measurable function f: A x B — [0, co], 


I. 7e f(a, b)u'(db)u(da) = e la f(a, b) u(da)p' (db) 


For s-finite measures u: Xg — [0,00] and w: Xy — [0,00] and any measurable 
function f: Ax B —> [0,00], 


S bEB Ha, Be (a) da) E Ja i= Ka, Sey 


(Sub-)probability Kernels, s-finite Kernels, Dirac Delta, Lebesgue Kernel, Moti- 
vation for s-finite Kernels. In the following, let (A, X4) and (B, Xg) be mea- 
surable spaces. A (sub-)probability kernel with source A and target B is a func- 
tion k: A x Xg — [0,00) such that for all a € A: K(a,-): Xg — [0,c) is 
a (sub-)probability measure, and VS € Xg: K(-,S): A — [0,0o) is measur- 
able. k: A x Xg — [0,co] is an s-finite kernel with source A and target B if 
k is a pointwise sum of sub-probability kernels k;: A x Xg — [0, 00), meaning 
Kk = J ien Ki: We denote the set of s-finite kernels with source A and target B 
by At BC Ax Xg > [0, œ]. Because we only ever deal with s-finite kernels, 
we often refer to them simply as kernels. 

We can understand the Dirac measure as a probability kernel. For a measur- 
able space (A, X4), the Dirac delta 56: At A is defined by ô(a, S) = [a € S]. 
Note that for any a, 6(a,-): Xa — [0,00] is the Dirac measure. We often write 
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d(a)(S) or da(S) for ô(a, S). Note that we can also interpret ô: At A as an 
s-finite kernel from A+ B for A C B. The Lebesgue kernel \*: A+ R is defined 
by A*(a)(S) = A(S), where A is the Lebesgue measure. The definition of s-finite 
kernels is a lifting of the notion of s-finite measures. Note that for an s-finite ker- 
nel k, K(a,-) is an s-finite measure for all a € A. In the context of probabilistic 
programming, s-finite kernels have been used before [34]. 

Working in the space of sub-probability kernels is inconvenient, because, for 
example, A*: R +> R is not a sub-probability kernel. Even though A*(ax) is ø- 
finite measure for all x € R, not all s-finite kernels induce o-finite measures in 
this sense. As an example, (A*;\*)(x) is not a o-finite measure for any x € R 
(see Lemma 15 in Appendix A.1). We introduce (;) shortly in Definition 1. 

Working in the space of s-finite kernels is convenient because s-finite kernels 
have many nice properties. In particular, the set of s-finite kernels A +> B is the 
smallest set that contains all sub-probability kernels with source A and target 
B and is closed under countable sums. 


Lifting Kernels to Exception States, Removing Weight from Exception States. 
For kernels k: A œ> B or kernels x: A — B, « lifted to exception states R: Ate B 
is defined by R(a) = «(a) if a € A and R(a) = 6(a) if a € A. When transforming 
k into K, we preserve (sub-)probability and s-finite kernels. 


Composing kernels, composing kernels in the presence of exception states. 
Definition 1. Let (;): (A — B) (B = C) — (A C) be defined by 
(f39)(@)(S) = freg I) (S) f(a) (db). 

Note that f;g intuitively corresponds to first applying f and then g. Throughout 


this paper, we mostly use >=> instead of (;), but we introduce (;) because it is 
well-known and it is instructive to show how our definition of >=> relates to (;). 


Lemma 3. (;) is associative, left- and right-distributive, has neutral element? 5 
and preserves (sub-)probability and s-finite kernels. 


Definition 2. Let (>=>): (A — B) = (B |= C) = (A |= C) be defined by 
(f >=> 9) a) (S) = yew I)(S) F(a) (db). 


We sometimes write f(a) >= g for (f >=> g)(a). 


Lemma 4. For f: A> B and g: B> C, a € A and S € Xp, 


(f >=> 9) (a) (8) = (f:9)(a)(S) + X EENAA 


LEX 


Lemma 4 shows how >=> relates to (;), by splitting f >=> g into non- 
exceptional behavior of f (handled by (;)) and exceptional behavior of f (handled 
by a sum). Intuitively, if f produces an exception state x € X, then g is not even 
evaluated. Instead, this exception is directly passed on, as indicated by ô(x)(S). 


? § is a neutral element of (;) if (6;«) = («;5) = « for all kernels x. 
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If f(a)(¥) = 0 for all a € A, or if SOX = f, then the definitions are equivalent 
in the sense that (f;9)(a)(S) = (f >=> g)(a)(S). The difference between >=> 
and (;) is the treatment of exception states produced by f. Note that technically, 
the target B of f: A — B does not match the source B of g: B+ C. Therefore, 
to formally interpret f;g, we silently restrict the domain of f to A x Xp. 


Lemma 5. >=> is associative, left-distributive (but not right-distributive), has 
neutral element 6 and preserves (sub-)probability and s-finite kernels. 


Product of Kernels, Product of Kernels in the Presence of Exception States. For 
s-finite kernels x: At> B, &’: At C, we define the product of kernels, denoted 
by Kx Kh’: Am Bx C, as (k x &’)(a)(S) = (K(a) x K'(a))(S). For s-finite kernels 
k: At B and «K': At C, we define the lifted product of kernels, denoted by 
KXK’: At B xC, as (KXk’)(a)(S) = (K(a)xK’(a))(S). x and x allow us to 
combine kernels to a joint kernel. Essentially, this definition reduces the product 
of kernels to the product of measures. 


Lemma 6. x and x for kernels preserve (sub-)probability and s-finite kernels, 
are associative, left- and right-distributive. 


Binding Conventions. To avoid too many parentheses, we make use of some 
binding conventions, ordering (in decreasing binding strength) x, x,;, >=>, +. 


Summary. The most important concepts introduced in this section are exception 
states, records, Lebesgue integration, Fubini’s theorem and (s-finite) kernels. 


4 A Probabilistic Language and Its Semantics 


We now describe our probabilistic programming language, the typing rules and 
the denotational semantics of our language. 


4.1 Syntax 


Let Y := QU {r,e} C R be a (countable) set of constants expressible in our 
programs. Let i,n € N, r € Y, x € Vars, © a generic unary operator (e.g., — 
inverts the sign of a value, ! is logical negation mapping 0 to 1 and all other 
numbers to 0, |:| and [-] round down and up respectively), © a generic binary 
operator (e.g., +, —, *, /, ^ for addition, subtraction, multiplication, division and 
exponentiation, &&, || for logical conjunction and disjunction, =, Æ, <, <, >, > 
to compare values). Let f: A — R — [0,00) be a measurable function that 
maps a € A to a probability density function. We check if f is measurable by 
uncurrying f to f: A x R — (0,00). Fig. 2 shows the syntax of our language. 
Our expressions capture () (the only element of 1), r (real numbers), x (vari- 
ables), (e1,...,€n) (tuples), efi] (accessing elements of tuples for i € N), Ge 
(unary operators), e1 ® e2 (binary operators), e1{e2] (accessing array elements), 
e€1[e2 + e3] (updating array elements), array(e1, e2) (creating array of length e1 
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es:=()|r|]a|(er,...,en) | eft] | Ge | e1 B e2 | e1fe2] | (Expressions) 
array(e1, €2) | e1[e2 +> e3] | F’(e) 

F ::= àx.{P; return e; } | flip | uniform | sampleFrom, (Functions) 

P ::= skip | z := e | x = e | Pi; Po | if e {Pı} else {P2}|{P}| (Statements) 


assert(e) | observe(e) | while e {P} 


Fig. 2. The syntax of our probabilistic language. 


containing ez at every index) and F (e) (evaluating function F on argument e). To 
handle functions F'(e1,..., en) with multiple arguments, we interpret (e€1,..., €n) 
as a tuple and apply F to that tuple. 

Our functions express Ax.{P; return e; } (function taking argument x run- 
ning P on x and returning e), flip(e) (random choice from {0,1}, 1 with prob- 
ability e), uniform(e1, €2) (continuous uniform distribution between e; and e2) 
and sampleFrom,(e) (sample value distributed according to probability density 
function f(e)). An example for f is the density of the exponential distribu- 
tion, indexed with rate ÀA. Formally, f: (0,co) — R — [0,00) is defined by 
f(A)(2) = AeW** if x > 0 and f(A)(x) = 0 otherwise. Often, f is partial (e.g., 
A < 0 is not allowed). Intuitively, arguments outside the allowed range of f 
produce the error state L. 

Our statements express skip (no operation), x := e (assigning to a fresh 
variable), x = e (assigning to an existing variable), P}; P (sequential com- 
position of programs), if e {Pı} else {P2} (if-then-else), {P} (static scop- 
ing), assert(e) (asserting that an expression evaluates to true, assertion fail- 
ure results in L), observe(e) (observing that an expression evaluates to true, 
observation failure results in 4) and while e {P} (while loops, non-termination 
results in ©). We additionally introduce syntactic sugar ej[e2] = e3 for 
e1 = eile2 © es], if (e) {P} for if e {P} else {skip} and func(e2) for 
Aàx.{P; return e1; }(e2) (using the name func for the function with argument 
x and body {P; return e;}). 


4.2 Typing Judgments 


Let n € N. We define types by the following grammar in BNF, where r[] denotes 
arrays over type T. We sometimes write [[;_, 7; for the product type T1 X+ -X Tn. 


Ti =1|R]|r[] |T X: XTn 


Note that we also use the type Tı + Tə of kernels with source 7, and target 7, 
but we do not list it here to avoid higher-order functions (discussed in Sect. 4.5). 

Formally, a context I’ is a set {2;: Ti}ie{n] that assigns a type T; to each 
variable x; € Vars. In slight abuse of notation, we sometimes write x € I if 
there is a type 7 with x: r € I. We also write I,x: 7 for IT U {x: T} (where 
x ¢ T) and I,I”’ for PUI” (where I’ and I” have no common variables). 
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y r Frew: = DFeniTn 
rH():1 ren S free TE IF (e1,..., 6n): T1 X +++ XTn 
DHe: ToX- X Tm ; . : 

e: To Tı La {0,... n= 1} Tke:R Fre:R Fes: R 
IT F efi]: ti Tt 6e:R Tbe, Peo: R 


Pre:t{] Pre:R Pre:R Eer 
I F eilea]: 7 I’ array(e1, e2): 7[] 


Pre:t{] Pre:R Feir rken KFinwr 
I F eiļe2 > e3]: 7[] T+ F(e): T2 


einer TF e: Te 


F Aw. {P; return e;}: 71 + 72 + flip: R> R F uniform: R x R> R 


H sampleFrom;: T => R f: A > R > [0, 00), A € X- 


Fig. 3. The typing rules for expressions and functions in our language 


— I ier Tree:t rr rr" 

skip x:=e sgr xr=e airtel P;Q n 
Tw T Pol, 2: 7 Cm I Tw I 
Èr Pee:R Tr r&r rHe:R rHe:R 
r {P} r p feet (Pa) p r assert(e) r r observe(e) 


Pee:R Pr 
while e {P} 
T ~ 


Fig. 4. The typing rules for statements 


The rules in Figs. 3 and 4 allow deriving the type of expressions, functions 
and statements. To state that an expression e is of type T under a context I’, 
we write l'H e: r. Likewise, H F: 7 ++ 7’ indicates that F is a kernel from 7 to 


T’. Finally, T J T" states that a context I is transformed to I” by a statement 
P. For sampleFrom,, we intuitively want f to map values from 7 to probability 
density functions. To allow f to be partial, i.e., to be undefined for some values 
from 7, we use A € X, (and hence A C [7]) as the domain of f (see Sect. 4.3). 


4.3 Semantics 


Semantic Domains. We assign to each type 7 a set [7] together with an implicit 
o-algebra X, on that set. Additionally, we assign a set [I] to each context 
I = {2;: ti}ien}. Concretely, we have [1] = 1 := {()} with ©, = {0,0}, 
[R] = R and Xr = B. The remaining semantic domains are outlined in Fig. 5. 
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[7] = Ut Xz is generated by U l S; | 5; € z) 
ieN ien (j=1 
[r x- X Tn] = [r Xr xxr IS generated by fi Sil Si € sa) 
i=1 i=l 


n 


[lr] = [e [7:]) Ær is generated by ig Si) 


i=1 i=1 


ŞS; < za} 
Fig. 5. Semantic domains for types 


[Ol )(S) = [0 € S] [r](o)(S) = [r € 5] [z]-(o)(S) = [o[z] € 5] 


[ers €n)lrix xr = [ell X*-%lenden [elles = lela xxr >=> M.6(EE) 
[e1/e2]r = [ere X[e2Ir >=> A(z, y). o ‘ ‘ f 
[es [e2]]- = lea]-y x e2|Ir >=> A(t, 2). En pa [t] 


d(tli-ev]) iEN,i< ltl 
6(L) otherwise 
d(uln) neN 

6(L) otherwise 


[e1le2 + es]]-g = er] -g X[e2Ir=< [es], >=> A(t, i, v). l 


[array(e1, e2)]-ņ lei]rxle2]- >=> A(n, v). l 


Fig. 6. The semantics of expressions. v!n stands for the n-tuple (v,...,v). t[i] stands 
for the i-th element (0-indexed) of the tuple t and t|i + v] is the tuple t, where the 
i-th element is replaced by v. |t| is the length of a tuple t. o stands for a program state 
over all variables in some I’, with o € [T]. 


Expressions. Fig. 6 assigns to each expression e typed by I’ F e: T a probability 
kernel [e],: [7] — [7]. When 7 is irrelevant or clear from the context, we may 
drop it and write [e]. The formal interpretation of [I] [7] is explained in 
Sect. 3.3 Note that Fig. 6 is incomplete, but extending it is straightforward. When 
we need to evaluate multiple terms (as in (€1,...,€n)), we combine the results 
using xX. This makes sure that in the presence of exceptions, the first exception 
that occurs will have priority over later exceptions. In addition, deterministic 
functions (like x +y) are lifted to probabilistic functions by the Dirac delta (e.g. 
ô(x+y)) and incomplete functions (like x/y) are lifted to complete functions via 
the explicit error state L. 


3 As a quick and intuitive reminder, x: A œ> B means that for every a € A, k(a) will 
be a distribution over B, where B is B enriched with exception states. Hence, s(a) 
may have weight on elements of B, on exception states, or on both. 


Fine-Grained Semantics for Probabilistic Programs 161 


Fig. 7 assigns to each function F typed by F F: Ti + 72 a probability kernel 
[Flncn: [n] > [r2]. In the semantics of flip, 6(1): Xg — [0,00] is a measure 
on R, and p- 6(1) rescales this measure pointwise. Similarly, the sum p- 6(1) + 
(1 —p) - 6(0) is also meant pointwise, resulting in a measure on R. Finally, Ap. p- 
6(1)+(1—p)-6(0) is a kernel with source (0, 1] and target R. For sampleFrom (ce), 
remember that f(p)(-) is a probability density function. 


, 7 p-6(1)+(1—p)-6(0) pe [0,1] 
iiaea o otherwise 
, _ AS. All r] n S) l<r 
[uniform]r>r = Ah): P otherwise 
E AS. Jrera f(p)(z)A(dz) pEA 
[sampleFrom;]->r = Ap. (1) sS A 


JAz.{P; return e; }]r,.47, = Av.ð ({r => v}) >=> [P] >= [e2]-. 


Fig. 7. The semantics of functions. 


[skip] = 6 peds tadan oncom 
[Pi; Pa] = [Ps] >=> [Pal KPY = [P] >=> à0'.ô(o'(T)) 
lif e {P1} etse (A = SXļe]r >=> Alo, b). te 7 
lassert(e)] = ôX]e]r >=> (0, b). fe a 
CE E AEN E P a 


Fig. 8. The semantics of programs in our probabilistic language. Here, o[x +> v] results 
in ø with the value stored under x updated to v. o’(I’) selects only those variables from 


o’ that occur in I’, meaning {2 > vibier({xi: Tiber’) = {i > vi hierar. 


Statements. Fig. 8 assigns to each statement P with I AT a probability kernel 
[P]: I] — H”]. Note the use of x in 6x[e], which allows evaluating e while 
keeping the state ø in which e is being evaluated. Intuitively, if evaluating e 
results in an exception from 1, the previous state ø is irrelevant, and the result 
of ôx [|e] will be that exception from ¥. 
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While Loop. To define the semantics of the while loop while e {P}, we introduce 
a kernel transformer [while e {P}]*?"s: ([T] + I) ({Z] — [L]) that 
transforms the semantics for n runs of the loop to the semantics for n + 1 runs 


of the loop. Concretely, 


. trans EENE sires = [P (a) D= K b £ 0 
[white e {P}]"°"(«) = ôx ]e] >=> A(o, b). { ila b0 
This semantics first evaluates e, while keeping the program state around using ô. 
If e evaluates to 0, the while loop terminates and we return the current program 
state o. If e does not evaluate to 0, we run the loop body P and feed the result 
to the next iteration of the loop, using k. 

We can then define the semantics of while e {P} using a special fixed point 
operator fix: ((A + A) —> (A+ A)) — (A A), defined by the pointwise 
limit fix(A) = limp—+.A”(O), where O:= Ao. ô(©) and A” denotes the n-fold 
composition of A. A” (©) puts all runs of the while loop that do not terminate 
within n steps into the state ©. In the limit, © only has weight on those runs of 
the loop that never terminate. fix(A) is only defined if its pointwise limit exists. 
Making use of fix, we can define the semantics of the while loop as follows: 


[white e {PH = fix [white e talaa 


Lemma 7. For A as in the semantics of the while loop, and for each o and 
each S, the limit limp A” (O) (o) (S) exists. 


Lemma 7 holds because increasing n may only shift probability mass from 
© to other states (we provide a formal proof in Appendix B). Kozen shows a 
different way of defining the semantics of the while loop [23], using least fixed 
points. Lemma 8 describes the relation of the semantics of our while loop to the 
semantics of the while loop of [23]. For more details on the formal interpretation 
of Lemma 8 and for its proof, see Appendix B. 


Lemma 8. In the absence of exception states, and using sub-probability kernels 
instead of distribution transformers, the definition of the semantics of the while 
loop from [23] is equivalent to ours. 


Theorem 2. The semantics of each expression |e] and statement |P] is indeed 
a probability kernel. 


Proof. The proof proceeds by induction. Some lemmas that are crucial for the 
proof are listed in Appendix C. Conveniently, most functions that come up in 
our definition are continuous (like a+b) or continuous except on some countable 
subset (like ¢) and thus measurable. 

4.4 Recursion 


To extend our language with recursion, we apply the same ideas as for the while 
loop. Given the source code of a function F that uses recursion, we define its 
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ox |:nie(3)| >=> hie b). Pi >=> A(x, y). S(x + y) ) (0) a 


Fig. 9. Kernel transformer [geom]"*"*(«) for geom given in Listing 11. 


semantics in terms of a kernel transformer [F]*"?"S. This kernel transformer takes 
semantics for F up to a recursion depth of n and returns semantics for F up to 
recursion depth n+1. Formally, [F]'?"5(«) follows the usual semantics, but uses 
k as the semantics for recursive calls to F (we will provide an example shortly). 


Finally, we define the semantics of F by |F] := fix (LF [= Just as for the 


while loop, fix (LF i) is well-defined because stepping from recursion depth n 
to n+1 can only shift probability mass from © to other states. We note that we 
could generalize our approach to mutual recursion. 

To demonstrate how we define the kernel geom(){ 


transformer, consider the recursive implemen- if !flip(3){ 

tation of the geometric distribution in List- return geom()+1; 

ing 11 (to simplify presentation, Listing 11 uses jelse{ 

early return). Given semantics «K for geom : 1 > returno; 

R up to recursion depth n, we can define the } 

semantics of geom up to recursion depth n + 1, } 

as illustrated in Fig. 9. Listing 11. Geometric distribu- 
tion 


4.5 Higher-Order Functions 


Our language cannot express higher-order functions. When trying to give seman- 
tics to higher-order probabilistic programs, an important step is to define a ø- 
algebra on the set of functions from real numbers to real numbers. Unfortunately, 
no matter which o-algebra is picked, function evaluation (i.e. the function that 
takes f and x as arguments and returns f(«)) is not measurable [1]. This is 
a known limitation that previous work has looked into (e.g. [35] address it by 
restricting the set of functions to those expressible by their source code). 

A promising recent approach is replacing measurable spaces by quasi-Borel 
spaces [16]. This allows expressing higher-order functions, at the price of replac- 
ing the well-known and well-understood measurable spaces by a new concept. 


4.6 Non-determinism 


To extend our language with non-determinism, we may define the semantics of 
expressions, functions and statements in terms of sets of kernels. For an expres- 
sion e typed by I F e: 7, this means that [e]- € P ([Z] > [7]), where P (S) 
denotes the power set of S. Lifting our semantics to non-determinism is mostly 
straightforward, except for loops. There, [while e {P}] contains all kernels of 
the form limp—o(Aio--: o A,)(O), where A; € [while e {P}]'?"S. Previous 
work has studied non-determinism in more detail, see e.g. [21,22]. 
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5 Properties of Semantics 


We now investigate two properties of our semantics: commutativity and associa- 
tivity. These are useful in practice, e.g. because they enable rewriting programs 
to a form that allows for more efficient inference [5]. 

In this section, we write e1 ~ e2 when expressions e and ez are equivalent 
(i.e. when [e1] = [e2]). Analogously, we write P, ~ Pz for [Pi] = [P2]. 


5.1 Commutativity 


In the presence of exception states, our language cannot guarantee commutativ- 
ity of expressions such as e; + e2. This is not surprising, as in our semantics the 
first exception bypasses all later exceptions. 


Lemma 9. For function F(){while 1 {skip}; return 0}, 


1 1 
=+F0 £F0 +z 
Formally, this is because if we evaluate + first, we only have weight on L. 
If instead, we evaluate F() first, we only have weight on ©, by an analogous 
calculation. A more detailed proof is included in Appendix D. 
However, the only reason for non-commutativity is the presence of exceptions. 
Assuming that e, and eg cannot produce exceptions, we obtain commutativity: 


Lemma 10. Jf [e:](7)(¥) = [e2](c)(¥) = 0 for all o, then e1 Peg ~ e2 G1, 
for any commutative operator ®. 


The proof of Lemma 10 (provided in Appendix D) relies on the absence of 
exceptions and Fubini’s Theorem. This commutativity result is in line with the 
results from [34], which proves commutativity in the absence of exceptions. 

In the analogous situation for statements, we cannot assume commutativ- 
ity Pi; Pa œ~ P»; Pı, even if there is no dataflow from Pı to Pj. We already 
illustrated this in Listing 10, where swapping two lines changes the program 
semantics. However, in the absence of exceptions and dataflow from P, to Po, 
we can guarantee Pi; Po ~ P>; Pi. 


5.2 Associativity 


A careful reader might suspect that since commutativity does not always hold 
in the presence of exceptions, a similar situation might arise for associativity of 
some expressions. As an example, can we guarantee e; +(e2+e3) ~ (e1 +e2) +e, 
even in the presence of exceptions? The answer is yes, intuitively because excep- 
tions can only change the behavior of a program if the order of their occurrence is 
changed. This is not the case for associativity. Formally, we derive the following: 


Lemma 11. e1 © (e2 Ges) ~ (e1 ® €2) ® e3, for any associative operator ©. 


We include notes on the proof of Lemma 11 in Appendix D, mainly relying on 
the associativity of x (Lemma 6). Likewise, sequential composition is associative: 
(Pi; P2); Pa ~ Pi; (P2; P3). This is due to the associativity of >=> (Lemma 5). 


Fine-Grained Semantics for Probabilistic Programs 165 


5.3 Adding the score Primitive 


Some languages include the primitive score, which allows to increase or decrease 
the probability of a certain event (or trace) [34,35]. 


Listing 12 shows an example program using score. x:=flip(4); 
Without normalization, it returns 0 with probability $ if x=1 { 
and 1 with “probability” $ -2 = 1. After normalization, score(2); 


it returns 0 with probability f and 1 with probability Z, } 

Because score allows decreasing the probability of a spe- return x; 
cific event, it renders observe unnecessary. In general, we 
can replace observe(e) by score(e + 0). However, perform- 
ing this replacement means losing the explicit knowledge of the weight on 4. 


Listing 12. Using 
score 


score can be useful to modify the shape of a given dis- 
tribution. For example, Listing 13 turns the distribution 
of x, which is a Gaussian distribution, into the Lebesgue 
measure A, by multiplying the density of x by its inverse. 


x:=gauss (0,1); 
score(\/2ne® /); 


return x; 
Listing 13. Reshap- 


Hence, the density of x at any location is 1. Note that the 
distribution over x cannot be described by a probability 
measure, because e.g. the “probability” that x lies in the interval [0,2] is 2. 


ing a distribution. 


Unfortunately, termination in the presence of score i:=0; 
is not well-defined, as illustrated in Listing 14. In this while 1 { 
program, the only non-terminating trace keeps changing if i= { 
its weight, switching between 1 and 2. In the limit, it is score(2); 
impossible to determine the weight of non-termination. felse{ i 
Hence, allowing the use of the score primitive score(3); 
only makes sense after abolishing the tracking of non- EN, 
termination (©), which can be achieved by only mea- } etic 


suring sets that do not contain non-termination. For- 
mally, this means restricting the semantics of expres- 
sions e typed by [| e: 7 to [e]: [> (F - {0}). 
Intuitively, abolishing non-termination means that we ignore non-terminating 
runs (these result in weight on non-termination). After doing this, we can give 
well-defined semantics to the score primitive. 

The typing rule and semantics of score are: 


Listing 14. score vs 
non-termination 


Ere:R 


score(e) 


r 


and [score(e)] = 6x[e]r >=> A(øØ, c).c x ô(o) 


r 

After including score into our language, the semantics of the language can 

no longer be expressed in terms of probability kernels as stated in Theorem 2, 

because the probability of any event can be inflated beyond 1. Instead, the 
semantics must be expressed in terms of s-finite kernels. 


Theorem 3. After adding the score primitive and abolishing non-termination, 
the semantics of each expression |e] and statement |P] is an s-finite kernel. 
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Table 3. Comparison of existing semantics to ours. When adding score to our language 
(Sect. 5.3), our semantics use s-finite kernels (not probability kernels). 


Work | Language Semantics Typed Higher-order | Loops Constraints 
We mperative | Probability kernels Typed First-order Loops (FP) Yes 

[4] Functional | Sub-probability kernels Untyped | Higher-order | Recursion (FP) Yes 

[23] mperative | Distribution transformers | Limited | First-order Loops (LFP) No 

[24] mperative | Sub-probability kernels Limited | First-order Loops (LFP) Yes 

[28] mperative | Weakest precondition Untyped | First-order Loops (LFP) Yes 

[33] Declarative | Probability kernels Limited | First-order Loops (LFP) No 

[34] Functional | s-finite kernels Typed First-order Counting measure | score(x) 
[35] Functional | Measurable functions Typed Higher-order | No score(x) 


Proof. As for Theorem 2, the proof proceeds by induction. Most parts of the 
proof are analogous (e.g. >=> preserves s-finite kernels instead of probability 
kernels). For while loops, the limit still exists (Lemma 7 still holds), but it is not 
bounded from above anymore. The limit indeed corresponds to an s-finite kernel 
because the limit of strictly increasing s-finite kernels is an s-finite kernel. 


In the presence of score, we can still talk about the  score(2); 
interaction of different exceptions, assuming that we do assert(false); 
track different types of exceptions (e.g. division by zero Listing 15. Inter- 
and out of bounds access of arrays). Then, we keep the action of score and 
commutativity and associativity properties studied in the assert 
previous sections, because these still hold for s-finite kernels. 

Listing 15 shows an interaction of score with 
assert. As one would expect, our semantics will 
assign weight 2 to L in this case. If the two 
statements are switched, our semantics will ignore } 
score(2) and assign weight 1 to L. Hence again, 
commutativity does not hold. 

Listing 16 shows a program that keeps increasing 
the probability of an error. In every loop iteration, there is a “probability” of 1 
of running into an error. Overall, Listing 16 results in weight oo on state L. 


while 1 { 
score(2); 
assert (flip(5)); 


Listing 16. Interaction of 
score, assert and loops 


6 Related Work 


Kozen provides classic semantics to probabilistic programs [23]. We follow his 
main ideas, but deviate in some aspects in order to introduce additional features 
or to make our presentation cleaner. The semantics by Hur et al. [19] is heavily 
based on [23], so we do not go into more detail here. Table3 summarizes the 
comparison of our approach to that of others. 


Kernels. Like our work, most modern approaches use kernels (i.e., functions from 
values to distributions) to provide semantics to probabilistic programs [4, 24, 33, 
34]. Borgström et al. [4] use sub-probability kernels on (symbolic) expressions. 
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Staton [34] uses s-finite kernels to capture the semantics of the score primitive 
(when we discuss score in Sect.5.3, we do the same). 

In the classic semantics of [23], Kozen uses distribution transformers (i.e., 
functions from distributions to distributions). For later work [24], Kozen also 
switches to sub-probability kernels, which has the advantage of avoiding redun- 
dancies. A different approach uses weakest precondition to define the semantics, 
as in [28]. Staton et al. [35] use a different concept of measurable functions 
A — P(Rso x B) (where P(S) denotes the set of all probability measures on S). 


Typing. Some probabilistic languages are untyped [4, 28], while others are limited 
to just a single type: R” [23,24] or UF2, Nf U N% [33]. Some languages provide 
more interesting types including sum types, distribution types and tuples [34, 35]. 
We allow tuples and array types, and we could easily account for sum types. 


Loops. Because the semantics of while loops is not always straightforward, some 
languages avoid while loops and recursion altogether [35]. Borgström et al. handle 
recursion instead of while loops, defining the semantics in terms of a fixed point 
[4]. Many languages handle while loops by least fixed points [23, 24, 28,33]. Staton 
defines while loops in terms of the counting measure [34], which is similar to 
defining them by a fixed point. We define the semantics of while loops in terms 
of a fixed point, which avoids the need to prove the least fixed point exists (still, 
the classic while loop semantics of [23] and our formulation are equivalent). 

Most languages do not explicitly track non-termination, but lose probabil- 
ity weight by non-termination [4,23,24,34]. This missing weight can be used 
to identify the probability of non-termination, but only if other exceptions 
(such as fail in [24] or observation failure in [4]) do not also result in miss- 
ing weight. The semantics of [33] are tailored to applications in networks and 
lose non-terminating packet histories instead of weight (due to a particular least 
fixed point construction of Scott-continuous maps on algebraic and continuous 
directed complete partial orders). Some works define non-termination as missing 
weight in the weakest precondition [28]. Specifically, the semantics in [28] can 
also explicitly express probability of non-termination or ending up in some state 
(using the separate construct of a weakest liberal precondition). We model non- 
termination by an explicit state ©, which has the advantage that in the context 
of lost weight, we know what part of that lost weight is due to non-termination. 

Kaminski et al. [21] investigate the run-time of probabilistic program with 
loops and fail (interpreted as early termination), but without observations. In 
[21], non-termination corresponds to an infinite run-time. 


Error States. Many languages do not consider partial functions (like fractions 
F) and thus never run into an exception state [23, 24,33]. Olmedo et al. [28] do 
not consider partial functions, but support the related concept of an explicit 
abort. The semantics of abort relies on missing weight in the final distribution. 
Some languages handle expressions whose evaluation may fail using sum types 
[34,35], forcing the programmer to deal with errors explicitly (we discuss the 
disadvantages of this approach at Listing 6). Formally, a sum type A+ Bisa 
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disjoint union of the two sets A and B. Defining the semantics of an expression in 
terms of the sum type A+{ 1} allows that expression to evaluate to either a value 
a € Aor to L. Borgström et al. [4] have a single state fail expressing exceptions 
such as dynamically detected type errors (without forcing the programmer to 
deal with exceptions explicitly). Our semantics also uses sum types to handle 
exceptions, but the handling is implicit, by defining semantics in terms of (>=>) 
(which defines how exceptions propagate in a program) instead of (;). 


Constraints. To enforce hard constraints, we use the observe(e) statement, which 
puts the program into a special failure state 4 if it does not satisfy e. We can 
encode soft constraints by observe(e), where e is probabilistic (this is a general 
technique). Borgström et al. [4] allow both soft constraints that reduce the prob- 
ability of some program traces and hard constraints whose failure leads to the 
error state fail. Some languages can handle generalized soft constraints: they 
can not only decrease the probability of certain traces using soft constraints, but 
also increase them, using score(x) [34,35]. We investigate the consequences of 
adding score to our language in Sect.5.3. Kozen [24] handles hard (and hence 
soft) constraints using fail (which results in a sub-probability distribution). 
Some languages can handle neither hard nor soft constraints [23,33]. Note though 
that the semantics of ProbNetKAT in [33] can drop certain packages, which is a 
similar behavior. Olmedo et al. [28] handle hard (and hence soft) constraints by 
a conditional weakest precondition that tracks both the probability of not failing 
any observation and the probability of ending in specific states. Unfortunately, 
this work is restricted to discrete distributions and is specifically designed to 
handle observation failures and non-termination. Thus, it is not obvious how to 
adapt the semantics if a different kind of exception is to be added. 


Interaction of Different Exceptions. Most existing work handles at least some 
exceptions by sub-probability distributions [4,23,24,33,34]. Then, any missing 
weight in the final distribution must be due to exceptions. However, this leads 
to a conflation of all exceptions handled by sub-probability distributions (for the 
consequences of this, see, e.g., our discussion of Listing 8). Note that semantics 
based on sub-probability kernels can add more exceptions, but they will simply 
be conflated with all other exceptions. 

Some previous work does not (exclusively) rely on sub-probability distribu- 
tions. Borgström et al. [4] handle errors implicitly, but still use sub-probability 
kernels to handle non-termination and score. Olmedo et al. can distinguish non- 
termination (which is conflated with exception failure) from failing observations 
by introducing two separate semantic primitives (conditional weakest precondi- 
tion and conditional liberal weakest precondition) [28]. Because their solution 
specifically addresses non-termination, it is non-trivial to generalize this treat- 
ment to more than two exception states. By using sum types, some semantics 
avoid interactions of errors with non-termination or constraint failures, but still 
cannot distinguish the latter [34,35]. Note that semantics based on sum types can 
easily add more exceptions (although it is impossible to add non-termination). 
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However, the interaction of different exceptions cannot be observed, because the 
programmer has to handle exceptions explicitly. 

To the best of our knowledge, we are the first to give formal semantics to 
programs that may produce exceptions in this generality. One work investigates 
assertions in probabilistic programs, but explicitly disallows non-terminating 
loops [32]. Moreover, the semantics in [32] are operational, leaving the distri- 
bution (in terms of measure theory) of program outputs unclear. Cho et al. [8] 
investigate the interaction of partial programs and observe, but are restricted to 
discrete distributions and to only two exception states. In addition, this inves- 
tigation treats these two exception states differently, making it non-trivial to 
extend the results to three or more exception states. Katoen et al. [22] investi- 
gate the intuitive problems when combining non-termination and observations, 
but restrict their discussions to discrete distributions and do not provide for- 
mal semantics. Huang [17] treats partial functions, but not different kinds of 
exceptions. In general, we know of no probabilistic programming language that 
distinguishes more than two different kinds of exceptions. Distinguishing two 
kinds of exceptions is simpler than three, because it is possible to handle one 
exception as an explicit exception state and the other one by missing weight (as 
e.g. in [4]). 

Cousot and Monerau [9] provide a trace semantics that captures probabilistic 
behavior by an explicit randomness source given to the program as an argument. 
This allows handling non-termination by non-terminating traces. While the work 
does not discuss errors or observation failure, it is possible to add both. However, 
using an explicit randomness source has other disadvantages, already discussed 
by Kozen [23]. Most notably, this approach requires a distribution over the ran- 
domness source and a translation from the randomness source to random choices 
in the program, even though we only care about the distribution of the latter. 


7 Conclusion 


In this work we presented an expressive probabilistic programming language 
that supports important features such as mixing continuous and discrete dis- 
tributions, arrays, observations, partial functions and while-loops. Unlike prior 
work, our semantics distinguishes non-termination, observation failures and error 
states. This allows us to investigate the subtle interaction of different exceptions, 
which is not possible for semantics that conflate different kinds of exceptions. Our 
investigation confirms the intuitive understanding of the interaction of exceptions 
presented in Sect. 2. However, it also shows that some desirable properties, like 
commutativity, only hold in the absence of exceptions. This situation is unavoid- 
able, and largely analogous to the situation in deterministic languages. 

Even though our semantics only distinguish three exception states, it can be 
trivially extended to handle any countable set of exception states. This allows 
for an even finer-grained distinction of e.g. division by zero, out of bounds array 
accesses or casting failures (in a language that allows type casting). Our seman- 
tics also allows enriching exceptions with the line number that the exception 
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originated from (of course, this is not possible for non-termination). For an 
uncountable set of exception states, an extension is possible but not trivial. 


A Proofs for Preliminaries 


In this section, we provide lemmas, proofs and some definitions that were left 
out or cut short in Sect. 3. For a more detailed introduction into measure theory, 
we recommend the book A crash course on the Lebesgue integral and measure 


theory [7]. 
A.1 Measures 


Definition 3. Let (A, X4) be a measurable space and p: X4 — [0,00] a measure 
on A. 


We call u s-finite if u can be written as a countable sum Do icy Hi of sub- 
probability measures pi. 

We call p o-finite if A= Uien 
- We call u finite if (A) < œ. 
— We call u a sub-probability measure if u(A) < 1. 
We call u a probability measure if u(A) = 1. 


A; for A; E€ Xa, with u(A;) < oo. 


Note that for a o-finite measure u, u( A) = œ is possible, even though ju(A;) < co 
for all i. As an example, the Lebesgue measure is o-finite because R = (enl ċ, i] 
with A([—i,i]) = 2 * i, but A(R) = oo. 


Lemma 12. The following definition of s-finite measures is equivalent to our 
definition of s-finite measures (the difference is that the uis are only required to 
be finite): 

We call u: Xa — [0,00] an s-finite measure if it can be written as u = 
Dien Hi for finite measures pi: Xa — [0,00]. 


Proof. Since any sub-probability measure is finite, one direction is trivial. For 
the other direction, let u = -jen "4; for finite measures u. Obviously, u > 0, 
u(0) = 0 and p(Ujen Ai) = jen Ai for mutually disjoint A; E€ XA, so p is a 
measure. To show that u can be written as a sum of sub-probability measures, 
let n; = [uj(A)]. Then, u = Drew Hi = Dien MH: = Dien Dyeing wl: We 
let u; := +n <1. 


Lemma 13. Any o-finite measure u: Xa — [0, co] is s-finite. 


Proof. Since p is o-finite, A = (Jien Ai with A; € X4 and (A) < œ. Without 
loss of generality, assume that the A; form a partition of A. Then, a(S) = 
Dien HOS N Aj), with u( N A;) < œ. Thus, pw is a countable sum of finite 
measures. 
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Definition 4. The counting measure c: B — [0,00] is defined by 


mE L S finite 


œo otherwise 
Definition 5. The infinity measure u: B — [0,00] is defined by 
0 S=0 
u(S) = l 
œo otherwise 


Lemma 14. Neither the counting measure nor the infinity measure are s-finite. 


Proof. For the counting measure c, assume (toward a contradiction) c = } -jey ĉi- 
We have R = {r € R | ce({r}) > 0} = Usentr E€ R | al{r} > 0} = 
Uien Unewir € R | ci({r}) > +}. Because R is uncountable, there must be 
i,n € N for which S := {r € R | c({r}) > 4+} is uncountable. Thus for any 
measurable, countably infinite S” C S, c;(S’) = co, which means that c; is not 
finite. Proceed analogously for the infinity measure. 


Lemma 15. The measure u : B — [0,co] with p(S) = io ve mu 4 is s- 


finite but not o-finite. 


Proof. u = J ien à, and A is s-finite, so p is s-finite. Assume (toward a contra- 
diction) that p is o-finite. Then R = (Uen Ai with A; € B and u(A;) < œœ. Thus, 
u(i) = 0 and hence a(R) = (Uien Ai) < Xien H(Ai) = 0, a contradiction. 


Lemma 16 


VS € Laxp: (ux p’)(S) = F H'({b € B | (a,b) € S})u(da) 


= | ul{a € A | (a,b) € S})u' (db) 


YS € Dazn: HFHS) = | P (© E B | T80) € Shula) 


E ih _p({a € A| (a,b) € S})y (ab) 
bE B 


Proof 
(ux W(S) = i j: [(a, b) € S]u'(db)u(da) 
acAbEB 
= | J [b € {8 € B | (a,b) € S¥]p!(db)u(da) 
a€AbeB 


Z 1 u({0! € B | (a,b!) € S})u(da) 


acA 
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(ux n'y(sy= ff Nab) € Sw (db)uda) 
a€AbeB 
7 f J [(a,b) € Sut Fubini 


beBacA 


= | u({a' € A | (a’,b) € SY)’ (db) 
bEB 


In the second line, we have used that (a,b) E€ S = > be {b €B |(a,b') € S}. 
The proof works analogously for x. 


Lemma 17. Let 6: Ate A, k: Ate B. Then, 


(XK) (a) (S) = r(a)({b € B | (a,b) € S}) 


Proof 


(xk)(a)(S) = / 6(a)({a’ € A | (a’,b) E€ S})K(a)(db) Lemma 16 


beB 


= | _[(a,b) € S]x(a)(db) 
bEB P 


= K(a)({b € B | (a,b) € SH 


Lemma 1. For measures u: Xa — [0,00], w: Xg —> [0,00], let S € Xa and 
T € Sig. Then, (ux (S x T) = u(S)- w (T). 


Proof 


(uxp)(S x T)= n u'({b €B | (a,b) € Sx T})u(da) Lemma 16 


acA 
7 Ja i ({ 7 ie \) u(da) 


= a(S) * (T) 


Lemma 2. x and x for s-finite measures are associative, left- and right-dis- 
tributive and preserve (sub-)probability and s-finite measures. 


Proof. Remember that (u x u’)(S) = fica henla b) € Slu'(db)u(da) and that 
(uxH’)(S) = faca Spegl(a, b) € S]e (db)u(da). Preservation of (sub-)probability 
measures is trivial. Distributivity and preservation of s-finite measures are easily 
established by properties of the Lebesgue integral in Lemma 19. 

For associativity, let u: X4 —> [0,00], u: Xg — [0,00] and u: Lo —> [0, oo]. 
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((u x pw’) x p")(S) 


2 J (ux p')({t € A x B| (t,c) € SY)” (de) Lemma 16 
cEC 

= / / J (a,b) € {tE Ax B| (t,c) € Syu (db)u(da)u” (de) 
c€C acAbEB 


= | | fare) € Siu! abyu(aayn" (de) 
cEC acAbeEB 
z / J J (a,b,c) esh dou doda Fabini 
a€AbEBcEC 
= J J n (b,c) € {LE Bx C | (a,t) € S}]u"(de)pu/(db) (da) 
a€AbEBcEC 


= f (Wx WEEB xC | (at) € SHulda) 


acA 
=(u x (W x p"))(S)u(da) Lemma 16 


The proof proceeds analogously for x. 


Lemma 18. Let (A, X4) and (B, Xp) be measurable spaces. Consider measures 
L, L1, 2: Xa —> [0, 00] and v, vi, V2: Xg —> [0,00]. We assume that vı < v2 and 
Ly < u2 hold pointwise. Then, 

XV, < pXV2 

[Py Xxv < u2XvV 


Proof. Let S € 4x p and vı < vg. Then, we have 


vı < n2 
= / [(a,b) € S]vı(db) < | _ [(a, b) € S]v2(db) Lemma 19 
beB beB 
=:f (a) =:9(a) 
= _ f(a)u(da) < / g(a) (da) Lemma 19 
acA acA 


= (ux )(S) < (uxv2)(S) 


The proof for pı xv < p2Xv is similar. 


A.2 Lebesgue Integral 


Lemma 19. Let (A, X4) and (B, Xp) be measurable spaces, E € X4 and F' € 
Xp measurable sets, f, fi,g: A — R and h: Ax B — R measurable functions, 
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H, hi, V: Xa — [0,00] and w: Xg — [0, oo] measures. 


J ro oo] 


aE 
0<f<g<o => ip Fla)y(da) < | glanda) 
ac E acE 
pov on (da) ue 
ach acE 
D ka ts 
n=l, ace r= 
i f 09 b) (db) (da) = k ~ (da)u(db) u, u'o-finite 
a€E beE’ bE EH’ ace 
f to (Èn) aes J fonda) 
acE n=l n=l E 
J KOSE) =e) peP 
ach 


Finally, if fi < fo <-+-< co, we have 


tim, f fat@u(da) =f tim fala)ulda) 


ack aE 


Proof. The following properties can be proven for simple functions and limits of 
simple functions (this suffices): 


[of f(a (En) a -5 a) (da) 


ee 


psy = f _ Souda) < f Havla) 


ack 
J em £(a)4(x)(da) = f(x) is straightforward. For the other properties, see [31]. 


Theorem 1 (Fubini’s theorem). For s-finite measures u: X4 — [0,00] and 
pt’: Xg — [0,00] and any measurable function f: A x B — [0,00], 


f f(a, b)u'(db)u(da) = I f(a, b)u(da)u' (db) 
acA JbEB bEB JacA 


For s-finite measures u: Xg — [0,00] and p’: Sz — [0,00] and any measurable 
function f: Ax B —> [0,00], 


j f _Fea,b)w(db)u(da) = f Í _Fabuldajw (d) 
a€AJbEB EBJacA 
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Proof. Let u = ien Hi and u = J ien H; for bounded measures u; and ji}. 
d f(a, bya! (db)u(da) 
acA JbEB 


S f(a, b)u;(db)ui(da) Lemma 19 
i jEN acA JbEB 


= =y a f(a, b)ui(da)u;(db) Fubini foro-finite measures pi, u 
i jEN acA 


=| f(a,b) (da) (db) 
bEB JacA 


The proof in the presence of exception state is analogous. 


Lemma 20. Fubini does not hold for the counting measure c: B — [0, cw] and 
the Lebesgue measure A: B — [0,00] (because c is not s-finite). 


xe[0,1] 1] yEfo, 1] xe[0,1] 1] 
/ oe 0, n 0,1] (dy) 
ye (0, €[0, ye 


A.3 Kernels 


Proof 


Lemma 21. Let kı, Kk}: Am B and k2,Kh: BC be s-finite kernels. 
If kı < «4, holds pointwise, then 


Ky >= Ko < K) > Ke 
If k2 < Kh holds pointwise, then 
Ay >=> k2 [L k1 >=> Kh 


Proof. Assume k2 < Kh. Thus, Ry < Kh. Now, let a € A, S € XG. 


(rı >=> K2)(a)(S) af K2(b)(S) rı (a) (db) 


beB 


The proof for kı >=> k2 < kK, >=> k2 works analogously. 


Lemma 3. (;) is associative, left- and right-distributive, has neutral element* 6 
and preserves (sub-)probability and s-finite kernels. 


4 § is a neutral element of (;) if (6;«) = («;5) = «k for all kernels x. 
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Proof. Remember that (f;g)(a)($) = fyeg g(b)(S) f(a) (db). Left- and right-dis- 
tributivity and the neutral element 6 follow from properties of the Lebesgue 
integral in Lemma 19. 

Associativity and preservation of (sub-)probability kernels is well known (see 
for example [12]). For s-finite kernels f = J jen fi and g = Vien gi and h = 
Jien hi, we have (for sub-probability kernels fi, gi, hi) 


S o s) (z+) D h= D> (Jug)ihr 
i€IN JEN kEN i,j,keN 
= X. falguhe) = oh) 


i,j,kEN 


(;) preserves s-finite kernels because for s-finite kernels f and g, we have (for 
sub-probability kernels fi, gi) fig = -i jen fijgJi a sum of kernels. 


Lemma 4. For f: Aw B andg: Bœ C, a € A and S € Xa, 


(f >= g)(@)(S) = )+ >) ie a)({x}) 


cEX 


Proof 
(f >= g)(@)(S) = f 20O om 
-| 9()(S) f(a) (db) +f 9(b)(S) f(a) (db) 
be bex 


B 


= g(b)(S) f(a)(db) + X FONS) Fla)({x}) 


bEB bex 


= (f:9)(a)(S) + Y EEAS) 


LEX 


Lemma 5. >=> is associative, left-distributive (but not right-distributive), has 
neutral element 6 and preserves (sub-)probability and s-finite kernels. 


Proof. Remember that (f >=> g)(a)(S) = fyeg 9(b)(S) f(a) (db). Left-distribu- 
tivity follows from the properties of the Lebesgue integral in Lemma 19. Right- 
distributivity does not necessarily hold because gı + g2(L) # 7i(-L) + ga(1). 
Associativity for f: Ate B, g: Bt C and h: C+ D can be derived by 
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(F >= 9) >=> h)(@)(S) 
= ( (£ >=> g);h) (a(S) + EE 
=| 


(fig + Aa’.as’. LOPIE sions 
+ D> 4(x)(S)(F >=> g)(a)({e}) 
=(figih)(a)(S) + (AdAS. 3 Ha} );A) (a)(S) 


=0(()integrates over non-exception states) 


+ D SENSE >=> gade} 
=(fig:h)(a)($) + Y SENON (FNU) + E D (a)({2"})) 


LEX 


= (Fgh) (a(S) + D> Sla) “ (a)({}) + wanted) 


LEX 


= (f39sh)(a)(S) + X 6(@)(S)(f:Aa’-AS".g(a")(S’))(a)({x}) 


LEX 
DIC OOLOKGI 
LEX 
=(figh)(a)($) + mi (aa AS. O 4(x)($")9(a')({2})) J08) 
LEX 
+ 32 4(x)(S) f(a)({2}) 
LEX 
hone yu "gla jaa 5) + Ý (#)(S) F(a) ({e}) 


=| f: (s >=> n)) (a)( ) + 2. 5(a a)({x}) n 


=(f >= (g >= h))(a 9) 


Here, we have used Lemma 4, left- and right-distributivity of (;). 
To show that f >=> g preserves s-finite kernels, let f: A — B and g: BRC 
be s-finite kernels. Then, for sub-probability kernels fi, 


(f >=> 9)(a)(S) = (f:9)(a)(S) + X (x a)({z}) 
LEX 
= (FAS) + $ X êa) a)({x}) 
LEX IEN 


Note that for each x € ¥ and i € N, àa. AS.8(x)(S) fi(a)({x}) is a sub-probability 
kernel. Thus, f >=> g is a sum of s-finite kernels and hence s-finite. 
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Proving that for sub-probability kernels f and g, f >=> g is also a (sub-) 
probability kernel is trivial, since we only need to show that (f >=> g)(a)(C) =1 
(or <1). 


Lemma 22. Let (A, X4) and (B, Xp) be measurable spaces. Let f: Ax B > 
(0, co] be measurable and K: A> B be a sub-probability kernel. Then, f': A —> 
(0, co] defined by 
f'(a) = f(a, b)«(a) (db) 
bEB 


is measurable. 
Proof. See Theorem 20 of [30]. 
Lemma 23. x and x preserve (sub-)probability kernels. 


Proof. Let k: A œ> B and K’: A œ> C be (sub-)probability kernels. The fact 
that (« x «’)(a)(-) for all a € A is a (sub-)probability measure is inherited from 
Lemma 2. It remains to show that (« x K’)(-)(S) is measurable for all S € Xgxc, 
with 


(x x K’)(a)(S) = f 7 / _ clive) € SIW (a)(de)(ay(a 


By Lemma 22, f’: Ax B — [0,00] defined by f’(a,6) = f. col(b, c) € S]K’ (a)(de) 
is measurable, using the measurable function f: (A x B) x C — [0,00] 
defined by f((a,6),c) = [(b,c) € S]. Again by Lemma 22, feg frccl(b,©) € 
S]«’(a)(dc)«(a)(db) is measurable. 

Proving that for (sub-)probability kernels x: A> B and K': A > C, KXK 
is a (sub-) probability kernel proceeds analogously. 


Lemma 6. x and X for kernels preserve (sub-)probability and s-finite kernels, 
are associative, left- and right-distributive. 


Proof. Associativity, left- and right-distributivity are inherited from respective 
properties of the product of measures established by Lemma 2. Sub-probability 
kernels are preserved by Lemma 23. 

S-finite kernels are preserved because K x kK! = (Jien Ki) X (Xien Ki) = 
Ži jen i X Kj (analogously for Xx). 


B Proofs for Semantics 


Lemma 7. For A as in the semantics of the while loop, and for each o and 


each S, the limit limp—o A” (O) (o) (S) exists. 


Proof. In general, 0 < A”"(O)(o)(S) < 1. First, we restrict the allowed argu- 
ments for limp. A” (O) (o)(S) to only those S with OE S. We prove by induc- 
tion that A”+I (©) < A” (©), meaning Vo: VS: OGE S => A™*1(O)(c)(S) < 
A”(©)(a)(S}). Hence, A” (©) is monotone decreasing in n and lower bounded by 
0, which means that the limit must exist. 


Fine-Grained Semantics for Probabilistic Programs 179 


As a base case, we have A!(@)(c)(S) < 1 = ĝo (S) = A°(O)(o)(S), because 
OE S. We proceed by induction with 


A"+1(¢5)(o)(8) = (5x14 Ao: { yee) O70 \) (0)(5) 


d(c) b=0 
< (sxe She i. { o >= A-LO) a ot) ()(S) 


= A"(O)(o)(5) 


In the second line, we have used the induction hypothesis. This application is 
valid because K2 < K% implies Ky >=> k2 < Ky >=> Kh (Lemma 21). 

We proceed analogously when we restrict the allowed arguments for the kernel 
limp soo A"(©)(o)(S) to only those S with O¢ S, proving A"*!(6) > A"(O) 
for that case. 


Lemma 8. In the absence of exception states, and using sub-probability kernels 
instead of distribution transformers, the definition of the semantics of the while 
loop from [23] is equivalent to ours. 


Definition 6. In [23], Kozen shows a different way of defining the semantics 
of the while loop. In our notation, and in terms of probability kernels instead of 
distribution transformers, that definition becomes 


[white © {P}] = sup y ([ritter(c)] >=> ri)“ >=> [filter(—c)] 
nEN k=0 


Here, exponentiation is in terms of Kleisli composition, i.e. k? = 6 and Kt! = 


k >=> K”. The sum and limit are meant pointwise. Furthermore, we define filter 
by the following expression (note that [filter(e)] and [filter(-e)] are only 
sub-probability kernels, not probability kernels). 


[fitter(c)] = dx[e] >=> Alo, b). { ete) A a : \ 


[ritter(-0)] = Fe] => MOH) BZ of 


To justify Lemma 8, we prove the more formal Lemma 24. Note that in the 
presence of exceptions (e.g. P is just assert(0)), Definition 6 does not make 
sense, because if 


Lemma 24. For all S with SA X =Q 


b ([ritter(c)] >=> r)“ >=> iter) (oS) = AHS) (S) 


k=0 


180 B. Bichsel et al. 
Proof. For n = 0, we have 
k=0 


i [filter(e)] >=> Pl)" >=> serio] (a(S) 


([Fitter(c (e)] >=> P)’ >> [rivter(-o)]) (a(S) 


5 >=> [filter(- 1) (o)(5) 
= [filter(—e)](c)(S) 


= (5x14 Ses Ne hs ea ) F ot) (0)(8) 
= (sxe SiS ie! a), oe z D ()(8) ogs 
(sxe sea her iy. { ca aye : E \) (OS) Snx=0 


=A'(0) 


For n > 0, we have 


BETE 
iMi 


eee) >=> [P1)" >=> ister(-o ()(S) 


filter(e)] >=> [P 


aa 
iM: 
AT 


° 


+" 4 ([fitter(e)] >=> P’) >=> steno] (o)(S) 


k+1 


) 
( filter(e)] >=> [P ) + s) >=> frster(-o (c)(S) 
) 


Il 
AET T, 
iM: 


=i 


filter(e)] >=> [P 


) >=> [filter(-e) ) (o)(S) since SN X = 


A 

Eo 

L 3 
i] = 


T 
V 
V 


[fitter(~e)]) (7)(S) 


M: 


filter(e)] >=> [P i >=> [filter(-e) ) (a) (S) + [filter(-e)](c)(S) 


x 
ll 
= 


(riser e)] >=> [P] >=> 55 ([fitter(e)] >=> [P ) >=> steno] (o)(S) 
k=0 
[filter(-e)](oc)(S 


n 


( 
l 
| 
a 
| 


[filter(e)] >=> [P] >=> (£ [fitter(e)] >=> [P])* >=> [rausert>o)) ) (o)(S) 


+ [filter(e)](o)(S 
= ([Fitter(c)] >=> [P] >=> att) (o)(S) + [filter(-e)](c)(S) 


= (xie >=> dio" 8) { re! Poe AO) bA 08. (2)(S) 


= A"*?(O)(0)(5) 


In particular, have have used that left-distributivity does hold in this case since 
SNX=9. 
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C Probability Kernel 


In the following, we list lemmas that are crucial to prove Theorem 2 (restated 
for convenience). 


Theorem 2. The semantics of each expression |e] and statement |P] is indeed 
a probability kernel. 


Lemma 25. Any measurable function f: A — [0,00] can be viewed as an s- 


finite kernel f: Ato 1, defined by f(x)(0) = 0 and f(x)(1) = f(x). 


Proof. We prove that f is an s-finite kernel. Let Aæ := {x € A | f(x) = œ}. 
Since f is measurable, the set As must be measurable. f(x)(S) = X enlt € 
All € S] + Mien fæli < f(z) < i+ 1][0 € S], which is a sum of finite 
kernels because the sets Aœ and {x |i < f(z) <i+1} = f-'({i,i+1)) are 
measurable. Note that any sum of finite kernels can be rewritten as a sum of 
sub-probability kernels. 


Lemma 26. Let k’: X + Y and k": X HY be kernels, and f: X — R mea- 
surable. Then, 


K(x)(S) if f(x) =0 
K'(x)(S) otherwise 


is a kernel. 


Proof. Let f=o(x) := [f(x) = 0], fyo(x) := [f(x) # 0]. Then, k = foo x K’ + 
fzo X K”. Viewing foo and fzo as kernels X +> 1 immediately gives the desired 
result. 


Lemma 27. Let (A, X74) and (B, Xp) be measurable spaces. Let {A;}ier be a 
partition of A into measurable sets, for a countable set of indices T. Consider 
a function f: A — B. If fia,: Ai > B is measurable for each i € T, then f is 
measurable. 


Lemma 28. Let f: A — B be measurable. Then K: At> B with k(a) = ô( f (a)) 
is a kernel. 


The following lemma is important to show that the semantics of the while 
loop is a probability kernel. 


Lemma 29. Suppose {Kn }nen is a sequence of (sub-)probability kernels Ato B. 
Then, if the limit k = limy kn exists, it is also a (sub-)probability ker- 
nel. Here, the limit is pointwise in the sense Va € A: VS € Xg: kla, S) = 
limp—oo Kn(a)(S). 


Proof. For every a € A, «(a,-) is a measure, because the pointwise limit of finite 
measures is a measure. For every S € Xg, «(-,S) is measurable, because the 
pointwise limit of measurable functions fn: A — R (with B as the o-algebra on 
R) is measurable. 
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D Proofs for Consequences 


In this section, we provide some proofs of consequences of our semantics, 
explained in Sect. 5. 


Lemma 9. For function F(){while 1 {skip}; return 0}, 
1 1 
+F F z 
L rO#FO+i 


Proof. If we evaluate 4 first, we will only have weight on L. 


[5+ Fo) 


II 
=a 


hg FIFO >=> Ate.u). ste +0) 


5(LXLFO] >=> Ale, y) de +9) 
=6(1L) >=> A(x, y).0(a + y) 
6(1) 


If instead, we first evaluate F(), we only have weight on ©, by an analogous 
calculation. 


Lemma 10. Jf [e:](7)(¥) = [e2](o)(¥) = 0 for allo, then e1 Ð eg ~ e2 Get, 
for any commutative operator ©. 


Proof 
[e1 ® e2](o)(S) = [es] xLe2] >=> A(z, y).0(a @ y) 
= _ Alz, y)-6(x S y)(z)(S)(Ler] x lea] (0) (dz) 


=| d(x S y)(S)(Ler] x [e2])(o) (ala, y)) 
(x,y)ERXR 


=| d(y © x)(S)([es] x fe1])(c) (dy, £)) 
(y,z)ERXR 
= [e2  e1] (7)(S) 


Here, we crucially rely on the absence of exceptions (for the third equality) and 
Fubini’s Theorem (for the fourth equality). 


Lemma 11. e1 È (e2 ® e3) œ (e1  €2) Ges, for any associative operator ®. 
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Proof. The important steps of the proof are the following. 


[e1 ® (e2 © e3)] = [e1] x lez © e3] >=> A(z, 5).0(a @ 8) 


= e11x (Leal Xles] >=> My, 2).6(y @ 2) >=> Ma, s).6(a @ 8) 
= ex] (le2IxTes]) >=> Ma, (y, z)).5(a By @ z) 


= (lele) >les] >=> Alle, y),2)-5(@ Sy © 2) 
= [(e1 ® e2) ® es] 


Here, we make crucial use of associativity for the lifted product of measures in 
Lemma 6. 
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Abstract. Bayesian networks (BNs) are probabilistic graphical models 
for describing complex joint probability distributions. The main prob- 
lem for BNs is inference: Determine the probability of an event given 
observed evidence. Since exact inference is often infeasible for large BNs, 
popular approximate inference methods rely on sampling. 

We study the problem of determining the expected time to obtain a 
single valid sample from a BN. To this end, we translate the BN together 
with observations into a probabilistic program. We provide proof rules 
that yield the exact expected runtime of this program in a fully auto- 
mated fashion. We implemented our approach and successfully analyzed 
various real-world BNs taken from the Bayesian network repository. 


Keywords: Probabilistic programs - Expected runtimes 
Weakest preconditions - Program verification 


1 Introduction 


Bayesian networks (BNs) are probabilistic graphical models representing joint 
probability distributions of sets of random variables with conditional depen- 
dencies. Graphical models are a popular and appealing modeling formalism, as 
they allow to succinctly represent complex distributions in a human-readable 
way. BNs have been intensively studied at least since 1985 [43] and have a wide 
range of applications including machine learning [24], speech recognition [50], 
sports betting [11], gene regulatory networks [18], diagnosis of diseases [27], and 
finance [39]. 


Probabilistic programs are programs with the key ability to draw values at ran- 
dom. Seminal papers by Kozen from the 1980s consider formal semantics [32] 
as well as initial work on verification [33,47]. McIver and Morgan [35] build 
on this work to further weakest—precondition style verification for imperative 
probabilistic programs. 
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The interest in probabilistic programs has been rapidly growing in recent 
years [20,23]. Part of the reason for this déjà vu is their use for representing 
probabilistic graphical models [31] such as BNs. The full potential of modern 
probabilistic programming languages like Anglican [48], Church [21], Figaro [44], 
R2 [40], or Tabular [22] is that they enable rapid prototyping and obviate the 
need to manually provide inference methods tailored to an individual model. 


Probabilistic inference is the problem of determining the probability of an event 
given observed evidence. It is a major problem for both BNs and probabilistic 
programs, and has been subject to intense investigations by both theoreticians 
and practitioners for more than three decades; see [31] for a survey. In particular, 
it has been shown that for probabilistic programs exact inference is highly unde- 
cidable [28], while for BNs both exact inference as well as approximate inference 
to an arbitrary precision are NP-hard [12,13]. In light of these complexity- 
theoretical hurdles, a popular way to analyze probabilistic graphical models as 
well as probabilistic programs is to gather a large number of independent and 
identically distributed (i.i.d. for short) samples and then do statistical reasoning 
on these samples. In fact, all of the aforementioned probabilistic programming 
languages support sampling based inference methods. 


Rejection sampling is a fundamental approach to obtain valid samples from BNs 
with observed evidence. In a nutshell, this method first samples from the joint 
(unconditional) distribution of the BN. If the sample complies with all evidence, 
it is valid and accepted; otherwise it is rejected and one has to resample. 

Apart from rejection sampling, there are more sophisticated sampling tech- 
niques, which mainly fall in two categories: Markov Chain Monte Carlo (MCMC) 
and importance sampling. But while MCMC requires heavy hand-tuning and suf- 
fers from slow convergence rates on real-world instances [31, Chapter 12.3], virtu- 
ally all variants of importance sampling rely again on rejection sampling [31,49]. 

A major problem with rejection sampling is that for poorly conditioned data, 
this approach might have to reject and resample very often in order to obtain 
just a single accepting sample. Even worse, being poorly conditioned need not be 
immediately evident for a given BN, let alone a probabilistic program. In fact, 
Gordon et al. [23, p. 177] point out that 


“the main challenge in this setting [i.e. sampling based approaches] is that 
many samples that are generated during execution are ultimately rejected 
for not satisfying the observations.” 


If too many samples are rejected, the expected sampling time grows so large that 
sampling becomes infeasible. The expected sampling time of a BN is therefore a 
key figure for deciding whether sampling based inference is the method of choice. 


How Long, O Bayesian Network, will I Sample Thee? More precisely, we use 
techniques from program verification to give an answer to the following question: 


Given a Bayesian network with observed evidence, how long does it take 
in expectation to obtain a single sample that satisfies the observations? 
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S= R=O0R=1 
R=0 a l-a a l-a 
R=1] 0.2 0.8 
E = Ues 
0.01 | 0.99 
0.25 | 0.75 
0.9 0.1 
0.2 0.8 


Fig. 1. A simple Bayesian network. 


As an example, consider the BN in Fig.1 which consists of just three nodes 
(random variables) that can each assume values 0 or 1. Each node X comes with 
a conditional probability table determining the probability of X assuming some 
value given the values of all nodes Y that X depends on (i.e. X has an incoming 
edge from Y), see [3, Appendix A.1] for detailed calculations. For instance, the 
probability that G assumes value 0, given that S and R are both assume 1, is 
0.2. Note that this BN is paramterized by a € [0, 1]. 

Now, assume that our observed evidence is the event G = 0 and we apply 
rejection sampling to obtain one accepting sample from this BN. Then our app- 
roach will yield that a rejection sampling algorithm will, on average, require 


200a? — 40a — 460 
89a? — 69a — 21 


guard evaluations, random assignments, etc. until it obtains a single sample that 
complies with the observation G = 0 (the underlying runtime model is discussed 
in detail in Sect.3.3). By examination of this function, we see that for large 
ranges of values of a the BN is rather well-behaved: For a € [0.08, 0.78] the 
expected sampling time stays below 18. Above a = 0.95 the expected sampling 
time starts to grow rapidly up to 300. 

While 300 is still moderate, we will see later that expected sampling times of 
real-world BNs can be much larger. For some BNs, the expected sampling time 
even exceeded 1018, rendering sampling based methods infeasible. In this case, 
exact inference (despite NP—hardness) was a viable alternative (see Sect. 6). 


Our Approach. We apply weakest precondition style reasoning a lá McIver and 
Morgan [35] and Kaminski et al. [30] to analyze both expected outcomes and 
expected runtimes (ERT) of a syntactic fragment of pGCL, which we call the 
Bayesian Network Language (BNL). Note that since BNL is a syntactic fragment 
of pGCL, every BNL program is a pGCL program but not vice versa. The main 
restriction of BNL is that (in contrast to pGCL) loops are of a special form 
that prohibits undesired data flow across multiple loop iterations. While this 
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restriction renders BNL incapable of, for instance, counting the number of loop 
iterations', BNL is expressive enough to encode Bayesian networks with observed 
evidence. 

For BNL, we develop dedicated proof rules to determine exact expected values 
and the exact ERT of any BNL program, including loops, without any user- 
supplied data, such as invariants [30,35], ranking or metering functions [19], 
(super)martingales [8-10], etc. 

As a central notion behind these rules, we introduce f-i.i.d.—ness of proba- 
bilistic loops, a concept closely related to stochastic independence, that allows us 
to rule out undesired parts of the data flow across loop iterations. Furthermore, 
we show how every BN with observations is translated into a BNLprogram, such 
that 


(a) executing the BNL program corresponds to sampling from the conditional 
joint distribution given by the BN and observed data, and 

(b) the ERT of the BNL program corresponds to the expected time until a 
sample that satisfies the observations is obtained from the BN. 


As a consequence, exact expected sampling times of BNs can be inferred by 
means of weakest precondition reasoning in a fully automated fashion. This can 
be seen as a first step towards formally evaluating the quality of a plethora of 
different sampling methods (cf. [31,49]) on source code level. 


Contributions. To summarize, our main contributions are as follows: 


— We develop easy—to-—apply proof rules to reason about expected outcomes and 
expected runtimes of probabilistic programs with f-i.i.d. loops. 

— We study a syntactic fragment of probabilistic programs, the Bayesian net- 
work language (BNL), and show that our proof rules are applicable to every 
BNL program; expected runtimes of BNL programs can thus be inferred. 

— We give a formal translation from Bayesian networks with observations to 
BNL programs; expected sampling times of BNs can thus be inferred. 

— We implemented a prototype tool that automatically analyzes the expected 
sampling time of BNs with observations. An experimental evaluation on real- 
world BNs demonstrates that very large expected sampling times (in the 
magnitude of millions of years) can be inferred within less than a second; This 
provides practitioners the means to decide whether sampling based methods 
are appropriate for their models. 


Outline. We discuss related work in Sect.2. Syntax and semantics of the prob- 
abilistic programming language pGCL are presented in Sect.3. Our proof rules 
are introduced in Sect.4 and applied to BNs in Sect.5. Section6 reports on 
experimental results and Sect. 7 concludes. 


1 An example of a program that is not expressible in BNL is given in Example 1. 
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2 Related Work 


While various techniques for formal reasoning about runtimes and expected out- 
comes of probabilistic programs have been developed, e.g. [6,7,17,25,38], none 
of them explicitly apply formal methods to reason about Bayesian networks on 
source code level. In the following, we focus on approaches close to our work. 


Weakest Preexpectation Calculus. Our approach builds upon the expected run- 
time calculus [30], which is itself based on work by Kozen [32,33] and McIver and 
Morgan [35]. In contrast to [30], we develop specialized proof rules for a clearly 
specified program fragment without requiring user-supplied invariants. Since 
finding invariants often requires heavy calculations, our proof rules contribute 
towards simplifying and automating verification of probabilistic programs. 


Ranking Supermartingales. Reasoning about almost-sure termination is often 
based on ranking (super)martingales (cf. [8, 10]). In particular, Chatterjee et al. [9] 
consider the class of affine probabilistic programs for which linear ranking super- 
martingales exist (LRAPP); thus proving (positive?) almost-sure termination for 
all programs within this class. They also present a doubly—exponential algorithm 
to approximate ERTs of LRAPP programs. While all BNL programs lie within 
LRAPP, our proof rules yield exact ERTs as expectations (thus allowing for com- 
positional proofs), in contrast to a single number for a fixed initial state. 


Bayesian Networks and Probabilistic Programs. Bayesian networks are a—if not 
the most—popular probabilistic graphical model (cf. [4,31] for details) for reason- 
ing about conditional probabilities. They are closely tied to (a fragment of) proba- 
bilistic programs. For example, INFER.NET [36] performs inference by compiling 
a probabilistic program into a Bayesian network. While correspondences between 
probabilistic graphical models, such as BNs, have been considered in the litera- 
ture [21,23,37], we are not aware of a formal soudness proof for a translation from 
classical BNs into probabilistic programs including conditioning. 

Conversely, some probabilistic programming languages such as CHURCH [21], 
STAN [26], and R2 [40] directly perform inference on the program level using 
sampling techniques similar to those developed for Bayesian networks. Our app- 
roach is a step towards understanding sampling based approaches formally: We 
obtain the exact expected runtime required to generate a sample that satisfies all 
observations. This may ultimately be used to evaluate the quality of a plethora 
of proposed sampling methods for Bayesian inference (cf. [31,49]). 


3 Probabilistic Programs 


We briefly present the probabilistic programming language that is used through- 
out this paper. Since our approach is embedded into weakest-precondition style 
approaches, we also recap calculi for reasoning about both expected outcomes 
and expected runtimes of probabilistic programs. 


? Positive almost-sure termination means termination in finite expected time [5]. 
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3.1 The Probabilistic Guarded Command Language 


We enhance Dijkstra’s Guarded Command Language [14,15] by a probabilis- 
tic construct, namely a random assignment. We thereby obtain a probabilistic 
Guarded Command Language (for a closely related language, see [35]). 

Let Vars be a finite set of program variables. Moreover, let Q be the set of 
rational numbers, and let D (Q) be the set of discrete probability distributions 
over Q. The set of program states is given by X = {0 | o: Vars > Q}. 

A distribution expression p is a function of type u: X > D (Q) that takes a 
program state and maps it to a probability distribution on values from Q. We 
denote by uo the distribution obtained from applying o to wp. 

The probabilistic guarded command language (pGCL) is given by the grammar 


C — skip (effectless program) 
diverge (endless loop) 
T:N U (random assignment) 
C; C (sequential composition) 
if (p) {C} else {C} (conditional choice) 
while (y) {C} (while loop) 
repeat {C } until (9) , (repeat-until loop) 


where x € Vars is a program variable, u is a distribution expression, and y is a 
Boolean expression guarding a choice or a loop. A pGCL program that contains 
neither diverge, nor while, nor repeat — until loops is called loop-free. 

For o € X and an arithmetical expression E over Vars, we denote by o(£) 
the evaluation of E in ø, i.e. the value that is obtained by evaluating E after 
replacing any occurrence of any program variable x in E by the value o(z). 
Analogously, we denote by a(y) the evaluation of a guard y in state o to either 
true or false. Furthermore, for a value v € Q we write ø |x + v] to indicate that 
we set program variable x to value v in program state øg, i.e.’ 


I j X v, ify=a 
ojt => Ul = e 
" loy), ify fe. 


We use the Iverson bracket notation to associate with each guard its according 
indicator function. Formally, the Iverson bracket [y] of y is thus defined as the 
function [py] = Ace o(y). 

Let us briefly go over the pGCL constructs and their effects: skip does not 
alter the current program state. The program diverge is an infinite busy loop, 
thus takes infinite time to execute. It returns no final state whatsoever. 

The random assignment x :~ p is (a) the only construct that can actually 
alter the program state and (b) the only construct that may introduce random 


3 We use A-expressions to construct functions: Function ÀX e «€ applied to an argument 
a evaluates to € in which every occurrence of X is replaced by a. 


192 K. Batz et al. 


behavior into the computation. It takes the current program state o, then sam- 
ples a value v from probability distribution 4o, and then assigns v to program 
variable z. An example of a random assignment is 


wie 1/2. (5) + 1/6- (y + 1) + 1/3- (y — 1). 


If the current program state is ø, then the program state is altered to either 
a |x + 5] with probability 1/2, or to ø [a+ o(y) + 1] with probability 1/6, or to 
a |x + o(y) — 1] with probability 1/3. The remainder of the pGCL constructs are 
standard programming language constructs. 

In general, a pGCL program C is executed on an input state and yields a 
probability distribution over final states due to possibly occurring random assign- 
ments inside of C. We denote that resulting distribution by [C],. Strictly speak- 
ing, programs can yield subdistributions, i.e. probability distributions whose total 
mass may be below 1. The “missing” probability mass represents the probability 
of nontermination. Let us conclude our presentation of pGCLwith an example: 


Example 1 (Geometric Loop). Consider the program Cgc, given by 


T7 0; C 1/2 - (0) + 1/2- (1); 
while (c= 1) {2:2 £+ 1; c:~ 1/2. (0) + 1/2- (1)} 


This program basically keeps flipping coins until it flips, say, heads (c = 0). 
In x it counts the number of unsuccessful trials. In effect, it almost surely sets 
c to 0 and moreover it establishes a geometric distribution on x. The resulting 
distribution is given by 


w 


[Creo], (T) = Sor =o lee 0n]: sar ; A 


n=0 


3.2 The Weakest Preexpectation Transformer 


We now present the weakest preexpectation transformer wp for reasoning about 
expected outcomes of executing probabilistic programs in the style of McIver 
and Morgan [35]. Given a random variable f mapping program states to reals, it 
allows us to reason about the expected value of f after executing a probabilistic 
program on a given state. 


Expectations. The random variables the wp transformer acts upon are taken 


from a set of so-called expectations, a term coined by McIver and Morgan [35]: 


4 This counting is also the reason that Ceo is an example of a program that is not 
expressible in our BNL language that we present later. 
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Definition 1 (Expectations). The set of expectations E is defined as 


= {f | f: Z> RS} ; 


We will use the notation f|x/E] to indicate the replacement of every occur- 
rence of x in f by E. Since x, however, does not actually occur in f, we more 
formally define fla/E] = àc o f(o |x => o(E))). 

A complete partial order < on E is obtained by point—wise lifting the canonical 
total order on RÆ, i.e. 


f3 fe wf VoeX: filo) < flo). 


Its least element is given by Ao e O which we (by slight abuse of notation) also 
denote by 0. Suprema are constructed pointwise, i.e. for S C E the supremum 
sup S is given by sup S = Age suppres f(a). 


We allow expectations to map only to positive reals, so that we have a complete 
partial order readily available, which would not be the case for expectations of 
type X > RU {—o0, +00}. A wp calculus that can handle expectations of such 
type needs more technical machinery and cannot make use of this underlying 
natural partial order [29]. Since we want to reason about ERTs which are by 
nature non-negative, we will not need such complicated calculi. 

Notice that we use a slightly different definition of expectations than McIver 
and Morgan [35], as we allow for unbounded expectations, whereas [35] requires 
that expectations are bounded. This however would prevent us from capturing 
ERTs, which are potentially unbounded. 


Expectation Transformers. For reasoning about the expected value of f € E 
after execution of C, we employ a backward—moving weakest preexpectation 
transformer wp[C]: E — E, that maps a postexpectation f € E to a preexpec- 
tation wp [C] (f) € E, such that wp [C] (f) (c) is the expected value of f after 
executing C on initial state ø. Formally, if C executed on input ø yields final 
distribution [C],, then the weakest preexpectation wp [C] (f) of C with respect 
to postexpectation f is given by 


we [C] (Ff) (0) = - fac], , (1) 


where we denote by f a h dv the expected value of a random variable h: A — RS 
with respect to a probability distribution v: A — [0, 1]. Weakest preexpectations 
can be defined in a very systematic way: 


Definition 2 (The wp Transformer [35]). The weakest preexpectation trans- 
former wp: pGCL — E — E is defined by induction on all pGCL programs accord- 
ing to the rules in Table 1. We call F(X) = [=| - f + [y] - wp [C] (X) the wp- 
characteristic functional of the loop while (Y) {C} with respect to postexpectation 
f. For a given wp-characteristic function Fp, we call the sequence {F} (0) }nen 
the orbit of Fp. 
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Table 1. Rules for the wp-transformer. 


Cc wp [C] (£) 

skip f 

diverge 0 

Ui p Age fo (Ave f[z/v]) dic 

if (y) {Cif else {C2} | [p] - wp [Ci] (F) + [>] - wp [C2] (F) 
Ci; C2 wp [C1] (wp [C2] (f)) 

while (p) {C’} lfp Xe [þe]; f + [e]: we [C] (X) 
repeat {C’} until (p) | wp[C’; while (~y) {CH (F) 


Let us briefly go over the definitions in Table1: For skip the program state is 
not altered and thus the expected value of f is just f. The program diverge 
will never yield any final state. The distribution over the final states yielded by 
diverge is thus the null distribution voọ(T) = 0, that assigns probability 0 to 
every state. Consequently, the expected value of f after execution of diverge is 
given by fp f duo = do ex 0: f(r) =0. 

The rule for the random assignment x :* u is a bit more technical: Let the 
current program state be ø. Then for every value v € Q, the random assignment 
assigns v to x with probability ,(v), where ø is the current program state. The 
value of f after assigning v to x is f(a [x = v]) = f[x/v](o) and therefore the 
expected value of f after executing the random assignment is given by 


Ero): flee) = ff (we Fle/el(o)) aie 


vEQ 


Expressed as a function of ø, the latter yields precisely the definition in Table 1. 

The definition for the conditional choice if (p) {C1} else {C2} is not surpris- 
ing: if the current state satisfies y, we have to opt for the weakest preexpectation 
of C1, whereas if it does not satisfy y, we have to choose the weakest preexpec- 
tation of C2. This yields precisely the definition in Table 1. 

The definition for the sequential composition C1; C2 is also straightforward: 
We first determine wp [C2] (f) to obtain the expected value of f after executing 
Cy. Then we mentally prepend the program Cj by Cı and therefore determine 
the expected value of wp [C2] (f) after executing C1. This gives the weakest 
preexpectation of C1; C2 with respect to postexpectation f. 

The definition for the while loop makes use of a least fixed point, which is 
a standard construction in program semantics. Intuitively, the fixed point iter- 
ation of the wp-characteristic functional, given by 0, Fp(0), F7(0), F?(0), ---, 
corresponds to the portion the expected value of f after termination of the 
loop, that can be collected within at most 0, 1, 2, 3, ... loop guard evaluations. 
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The Kleene Fixed Point Theorem [34] ensures that this iteration converges to 
the least fixed point, i.e. 


ee?) = lfp Fy = wp [while (y) {C}] (f). 
By inspection of the above equality, we see that the least fixed point is exactly the 
construct that we want for while loops, since suppen F} (0) in principle allows the 
loop to run for any number of iterations, which captures precisely the semantics 
of a while loop, where the number of loop iterations is—in contrast to e.g. for 
loops—not determined upfront. 

Finally, since repeat {C} until (y) is syntactic sugar for C; while (p) {C}, 
we simply define the weakest preexpectation of the former as the weakest pre- 
expectation of the latter. Let us conclude our study of the effects of the wp 
transformer by means of an example: 


Example 2. Consider the following program C: 


cr 1/3- (0) + 2/3 - (1); 
if (c = 0) {x :~ 1/2- (5) + 1/6 - (y + 1) + 1/3: (y — 1) } else {skip} 


Say we wish to reason about the expected value of x + c after execution of 
the above program. We can do so by calculating wp [C] (x + c) using the rules 
in Table 1. This calculation in the end yields wp |C] (x +c) = 3y+26/1s The 
expected valuation of the expression x + c after executing C is thus 3y+26/18. 
Note that «+c can be thought of as an expression that is evaluated in the final 
states after execution, whereas 3¥+26/13 must be evaluated in the initial state 
before execution of C. A 


Healthiness Conditions of wp. The wp transformer enjoys some useful prop- 
erties, sometimes called healthiness conditions [35]. Two of these healthiness 
conditions that we will heavily make use of are given below: 


Theorem 1 (Healthiness Conditions for the wp Transformer [35]). For 
all C € pGCL, fi, f2 E E, anda € Rso, the following holds: 


1. wp [C] (a: fi + fo) = a-wp[C] (f1) + we [CT (f2) (linearity) 
2. wp[C] (0) = 0 (strictness). 


3.3 The Expected Runtime Transformer 


While for deterministic programs we can speak of the runtime of a program on 
a given input, the situation is different for probabilistic programs: For those we 
instead have to speak of the expected runtime (ERT). Notice that the ERT can 
be finite (even constant) while the program may still admit infinite executions. 
An example of this is the geometric loop in Example 1. 

A wp-like transformer designed specifically for reasoning about ERTs is the 
ert transformer [30]. Like wp, it is of type ert|C] : E — E and it can be shown that 
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Table 2. Rules for the ert-transformer. 


C ert [C] (f) 

skip 1+f 

diverge o0 

T: p 1+àce fo (Ave f[x/v]) duo 

if ($) {C1} ese {C2} |1 + [ø] -ert [Ci] (F) + Hy] - ert [C2] P) 
Ci; Co ert [C1] ((ert [C2] (£) )) 

while (vy) {C’} lfp Xe 1+ [-9]- f+ [y]- ert [C] (X) 
repeat {C’} until (y) | ert [C’; while (=~) {C’}] (f) 


ert [C] (0) (c) is precisely the expected runtime of executing C on input o. More 
generally, if f: X — R&> measures the time that is needed after executing C 
(thus f is evaluated in the final states after termination of C), then ert [C] (f) (c) 
is the expected time that is needed to run C on input o and then let time f 
pass. For a more in-depth treatment of the ert transformer, see [30, Sect. 3]. The 
transformer is defined as follows: 


Definition 3 (The ert Transformer [30]). The expected runtime transformer 
ert: pGCL — E — E is defined by induction on all pGCL programs according to 
the rules given in Table 2. We call F(X) = 1+[79]- f+ [y]-wp [C] (X) the ert- 
characteristic functional of the loop while (Y) {C} with respect to postexpectation 
f. As with wp, for a given ert—characteristic function Fy, we call the sequence 
{FF (0) jnen the orbit of Ff. Notice that 


ert [while (y) {C}](f) = lfp Fy = sup {F?(0)}nen. 


The rules for ert are very similar to the rules for wp. The runtime model we 
assume is that skip statements, random assignments, and guard evaluations 
for both conditional choice and while loops cost one unit of time. This runtime 
model can easily be adopted to count only the number of loop iterations or only 
the number of random assignments, etc. We conclude with a strong connection 
between the wp and the ert transformer, that is crucial in our proofs: 


Theorem 2 (Decomposition of ert [41]). For any C € pGCL and f € E, 


ert [C] (F) = ert [C] (0) + wp [C] (f)- 


4 Expected Runtimes of i.i.d Loops 


We derive a proof rule that allows to determine exact ERTs of independent 
and identically distributed loops (or i.i.d. loops for short). Intuitively, a loop 
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while ((x — 5)? + (y — 5)° > 25) { ; 


x :& Unif[0...10]; 5 
y :& Unit [0...10] | 
} 


Fig. 2. An iid. loop sampling a point within a circle uniformly at random using 
rejection sampling. The picture on the right-hand side visualizes the procedure: In 
each iteration a point (x) is sampled. If we obtain a point within the white area inside 
the square, we terminate. Otherwise, i.e. if we obtain a point within the gray area 
outside the circle, we resample. 


is ii.d. if the distributions of states that are reached at the end of different 
loop iterations are equal. This is the case whenever there is no data flow across 
different iterations. In the non—probabilistic case, such loops either terminate 
after exactly one iteration or never. This is different for probabilistic programs. 

As a running example, consider the program C'eircle in Fig. 2. Ceircie samples 
a point within a circle with center (5,5) and radius r = 5 uniformly at random 
using rejection sampling. In each iteration, it samples a point (x,y) € [0,..., 10]? 
within the square (with some fixed precision). The loop ensures that we resample 
if a sample is not located within the circle. Our proof rule will allow us to 
systematically determine the ERT of this loop, i.e. the average amount of time 
required until a single point within the circle is sampled. 

Towards obtaining such a proof rule, we first present a syntactical notion 
of the i.i.d. property. It relies on expectations that are not affected by a pGCL 
program: 


Definition 4. Let C € pGCL and f € E. Moreover, let Mod(C) denote the set 
of all variables that occur on the left-hand side of an assignment in C, and let 
Vars(f) be the set of all variables that “occur in f”, i.e. formally 


x € Vars(f) iff do 3v, v: f(cletr vj) 4 floe => vI). 


Then f is unaffected by C, denoted f gC, iff Vars( f) O Mod (C) = 0. 


We are interested in expectations that are unaffected by pGCL programs because 
of a simple, yet useful observation: If g gi C, then g can be treated like a constant 
w.r.t. the transformer wp (i.e. like the a in Theorem 1 (1)). For our running exam- 
ple Ceircie (see Fig. 2), the expectation f = wp [Coody] (lx + y < 10]) is unaf- 
fected by the loop body Chody Of Ceircie- Consequently, we have wp [Coody] (f) = 
f- wp [Coody] (1) = f. In general, we obtain the following property: 


Lemma 1 (Scaling by Unaffected Expectations). Let C € pGCL and 
f,g E E. Then g ji C implies wp [C] (g9 - f) = g-wp[C] (f). 


Proof. By induction on the structure of C. See [3, Appendix A.2]. 
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We develop a proof rule that only requires that both the probability of the guard 
evaluating to true after one iteration of the loop body (i.e. wp [C] ([y])) as well 
as the expected value of [-y]- f after one iteration (i.e. wp [C] ([7¢]- f)) are 
unaffected by the loop body. We thus define the following: 


Definition 5 (f-Independent and Identically Distributed Loops). Let 
C € pGCL, p be a guard, and f € E. Then we call the loop while (vy) {C} 
f-independent and identically distributed (or f-.i.d. for short), if both 


wp[C] (lp) MC and  wp[C] (Pyl: f) HC. 


Example 3. Our example program C'eircle (see Fig. 2) is f-i.i.d. for all f € E. 
This is due to the fact that 


wp [Coody] ([(£ — 5)? + (y — 5)? > 25]) = z i M Coody (by Table 1) 
and (again for some fixed precision p € N \ {0}) 
wp | (ee 5) + (y - 5) > 25] - f) 
BA D [(i/p — 5)? + (9/p — 5)? > 25] - f[x/(/»), Y/0/P)] M Coody, A 


Our main technical Lemma is that we can express the orbit of the wp- 
characteristic function as a partial geometric series: 


Lemma 2 (Orbits of f-i.i.d. Loops). Let C € pGCL, p be a guard, f € 
such that the loop while (p) {C} is f-did, and let Fp be the corresponding wp- 
characteristic function. Then for all n € N \ {0}, it holds that 


n—2 


F (0) = weler: D: $ (vorera) + bl- f. 


i=0 
Proof. By use of Lemma 1, see [3, Appendix A.3]. 


Using this precise description of the wp orbits, we now establish proof rules for 
f—i.i.d. loops, first for wp and later for ert. 


Theorem 3 (Weakest Preexpectations of f-i.i.d. Loops). Let C € pGCL, 
y be a guard, and f € E. If the loop while (y) {C} is f-i.i.d., then 


wo fmnsre(y) {CH (f) = i EEE gs 


where we define 3 = 0. 
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Proof. We have 


wp [while (y) {C}] (f) 


= sup Fẹ (0) (by Definition 2) 
nEN 


= sup [y]: wp [C] (Pyl: f): 5 G [c] a) + [>y]: f (by Lemma 2) 
i=0 


nEN 


i=0 


= [y]- wp [Cl] (( an Yf f) +s (t) 


The preexpectation (f) is to be evaluated in some state ø for which we have 
two cases: The first case is when wp [C] ([y]) (7) < 1. Using the closed form of 
the geometric series, i.e. $p o q = = if |g| < 1, we get 


le] (o) - wp [C] (el: £) (2): (ve [C] (yl) or) + [>y] (2) - fle) 
(f instantiated in o) 


wp [C] (el - F) (@) 


= ll) Tp ICl de) 


+ [y] (e) - F(o). 


(closed form of geometric series) 


The second case is when wp [C] ([p]) (oc) = 1. This case is technically slightly 
more involved. The full proof can be found in [3, Appendix A.4]. 


We now derive a similar proof rule for the ERT of an f-i.i.d. loop while (p) {C}. 


Theorem 4 (Proof Rule for ERTs of f-i.i.d. Loops). Let C € pGCL, y 
be a guard, and f € E such that all of the following conditions hold: 


1. while (vy) {C} is f-i.i.d. 
2. wp[C] (1) = 1 (loop body terminates almost-surely). 
3. ert [C] (0) WC (every iteration runs in the same expected time). 


Then for the ERT of the loop while (p) {C} w.r.t. postruntime f it holds that 


eae _ Weert Cb) ay, 


where we define 5 = 0 and 5 = œ, fora#0. 


Proof. We first prove 


1 + ert [C] (0) 


ert [while (p) {C} (0) = 1+ lel: aq agp: 
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To this end, we propose the following expression as the orbit of the ert—-charac- 
teristic function of the loop w.r.t. 0: 


n n-1 
RO = 11a: (mio: + Sweter) 
i=0 i=0 
For a verification that the above expression is indeed the correct orbit, we refer to 
the rigorous proof of this theorem in [3, Appendix A.5]. Now, analogously to the 
reasoning in the proof of Theorem 3 (i.e. using the closed form of the geometric 
series and case distinction on whether wp [C] ([y]) < 1 or wp [C] (ly]) = 1), 
we get that the supremum of this orbit is indeed the right-hand side of (t). To 
complete the proof, consider the following: 


ert [while (Y) {CH (f) 
= ert while (Y) {C}] (0) + wp [while (p) {C}] (f) (by Theorem 2) 
7 —itert(C]0) |. wI EAD a. 
= 14 M cee O Ten HAS 
(by (ł) and Theorem 3) 
= 1+[g]: 1 - E at + [ay]: f (by Theorem 2) 


5 A Programming Language for Bayesian Networks 


So far we have derived proof rules for formal reasoning about expected out- 
comes and expected run-times of iid. loops (Theorems3 and 4). In this 
section, we apply these results to develop a syntactic pGCL fragment that 
allows exact computations of closed forms of ERTs. In particular, no invariants, 
(super)martingales or fixed point computations are required. 

After that, we show how BNs with observations can be translated into pGCL 
programs within this fragment. Consequently, we call our pGCL fragment the 
Bayesian Network Language. As a result of the above translation, we obtain a 
systematic and automatable approach to compute the expected sampling time 
of a BN in the presence of observations. That is, the expected time it takes to 
obtain a single sample that satisfies all observations. 


5.1 The Bayesian Network Language 


Programs in the Bayesian Network Language are organized as sequences of 
blocks. Every block is associated with a single variable, say x, and satisfies 
two constraints: First, no variable other than x is modified inside the block, i.e. 
occurs on the left-hand side of a random assignment. Second, every variable 
accessed inside of a guard has been initialized before. These restrictions ensure 
that there is no data flow across multiple executions of the same block. Thus, 
intuitively, all loops whose body is composed from blocks (as described above) 
are f-i.i.d. loops. 
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Definition 6 (The Bayesian Network Language). Let Vars = {21, £2, ...} 
be a finite set of program variables as in Sect. 3. The set of programs in Bayesian 
Network Language, denoted BNL, is given by the grammar 


C — Seq | repeat {Seq}until (y) | ©; C 


Seq —> Seq; Seq | Bz, | Be, | 
By, — ax | if (vy) {x :~ pw} else {B,,} 
(rule exists for all x; € Vars) 


where x; E€ Vars is a program variable, all variables in p have been initialized 
before, and Bz, is anon-terminal parameterized with program variable x; € Vars. 
That is, for all xi E€ Vars there is a non-terminal B,,. Moreover, w is an arbitrary 
guard and u is a distribution expression of the form u = jet Pi - (aj) with 
aj €Qforl<j<n. 


Example 4. Consider the BNL program Cgjce: 
zı :X Unif[1...6]; repeat {ao :~ Unif[1...6]} until (z2 > 21) 


This program first throws a fair die. After that it keeps throwing a second die 
until its result is at least as large as the first die. A 


For any C € BNL, our goal is to compute the exact ERT of C, i.e. ert [C] (0). 
In case of loop-free programs, this amounts to a straightforward application of 
the ert calculus presented in Sect.3. To deal with loops, however, we have to 
perform fixed point computations or require user—supplied artifacts, e.g. invari- 
ants, supermartingales, etc. For BNL programs, on the other hand, it suffices 
to apply the proof rules developed in Sect. 4. As a result, we directly obtain an 
exact closed form solution for the ERT of a loop. This is a consequence of the 
fact that all loops in BNL are f-i.i.d., which we establish in the following. 

By definition, every loop in BNL is of the form repeat { Bz, } until (y), which 
is equivalent to B,,; while (—7) {B,,}. Hence, we want to apply Theorem 4 to 
that while loop. Our first step is to discharge the theorem’s premises: 


Lemma 3. Let Seq be a sequence of BNL-blocks, g € E, and w be a guard. 
Then: 


1. The expected value of g after executing Seq is unaffected by Seq. That is, 
wp [Seq] (g) @ Seq. 

2. The ERT of Seq is unaffected by Seq, i.e. ert [Seq] (0) M Seq. 

3. For every f € E, the loop while (~y) {Seg} is f-i.i.d. 


Proof. 1. is proven by induction on the length of the sequence of blocks Seq and 
2. is a consequence of 1., see [3, Appendix A.6]. 3. follows immediately from 1. by 
instantiating g with [>] and [wv] - f, respectively. 


We are now in a position to derive a closed form for the ERT of loops in BNL. 
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Theorem 5. For every loop repeat {Seq} until (Y) € BNL and every f € E, 


1 + ert [Seq] (l¥] - £) 
wp [Seq] ([]) 
Proof. Let f € E. Moreover, recall that repeat {Seq} until (Y) is equivalent 


to the program Seg; while (—w) {Seq} € BNL. Applying the semantics of ert 
(Table 2), we proceed as follows: 


ert [repeat {Seq} until (w)](f) = ert [Seq] (ert [while (~) {Seq}] (f)) 


Since the loop body Seq is loop-free, it terminates certainly, i.e. wp [Seq] (1) = 
1 (Premise 2. of Theorem4). Together with Lemma3.1. and 3., all 
premises of Theorem4 are satisfied. Hence, we obtain a closed form for 


ert [while (~) {Seq}] (F): 
= ersed ( 1+ 


ert [repeat {Seq} until (w)](f) = 


[y]: (1 + ert [Seq] ([¥] - f)) 
1 — wp [Seq] ([=¥]) 


=9 


+s) 


By Theorem 2, we know ert [Seq] (g) = ert [Seq] (0) + wp [C] (g) for any g. Thus: 


_ Fy] (+ ert [Seq] (dy) - f)) 
: 1 — wp [Seq] ([-¥]) 


g 


= ert [Seq] (0) + wp[ Seq] ( 1 + [v]-f ) 


Since wp is linear (Theorem 1 (2)), we obtain: 


= ert [Seq] (0) + wp [Seq] (1) + wp [Seq] ([Y] - £) 
— 
[y]: (1 + ert [Seq] ([¥] - 2) 
1 — wp [Seq] (4]) 


By a few simple algebraic transformations, this coincides with: 


1 + ert [Seq] ([y] - 2) 
1 — wp [Seq] ([-¥)]) 
Let R denote the fraction above. Then Lemma 3.1. and 2. implies R gf Seq. 


We may thus apply Lemma to derive wp [Seq] (>4] - R) = wp [Seq] (>4]) - R 
Hence: 


+ wp [Seq] ( 


= 1+ert [Seq] (0) + wp [Seq] ([v] - fF) + wp [Seq] (H Y] 


1 + ert [Seq] ({w] - f) 
1 — wp [Seq] ([>¥]) 


Again, by Theorem 2, we know that ert [Seq] (g) = ert [Seq] (0) +wp [Seq] (g) for 
any g. Thus, for g = [4] - f, this yields: 


= 1+ ert [Seq] (0) + wp [Seq] ([Y] - F) + we [Seq] (=4)) - 


1 + ert [Seq] ([¥] - f) 
1 — wp [Seq] ([=4]) 


= 1+ ert [Seq] ([y] - f) + wp [Seg] ((=4]) - 


How long, O Bayesian network, will I sample thee? 203 


Then a few algebraic transformations lead us to the claimed ERT: 


1+ ert [Seq] ({v]- f) 
wp [Seq] ([v]) ` 


Note that Theorem 5 holds for arbitrary postexpectations f € E. This enables 
compositional reasoning about ERTs of BNL programs. Since all other rules of the 
ert—calculus for loop-free programs amount to simple syntactical transformations 
(see Table 2), we conclude that 


Corollary 1. For any C € BNL, a closed form for ert |C] (0) can be computed 
compositionally. 


Example 5. Theorem 5 allows us to comfortably compute the ERT of the BNL 
program Cice introduced in Example 4: 


zı :X Unif[1...6]; repeat {z :~ Unif[1...6]} until (x2 > zı) 
For the ERT, we have 
ert [Caicel (0) 
= ert |x, :~ Unif[1...6]] (ert [repeat {...} until ([v2 > xı])] (0)) (Table 2) 


1 + ert [£2 :® Unif[1...6]] ([£2 > x1]) 
wp [v1 :~ Unif[1. ..6]] ([v2 > zı]) ) (Thm. 5) 


= 1 At diaj<c 1/6: 2 fl 
2 K Di<j<e 1/6: [j = i] 


= 3.45. A 


= ert [xı :~ Unif[1. ..6]] ( 


(Table 2) 


5.2 Bayesian Networks 


To reason about expected sampling times of BNs, it remains to develop a sound 
translation from BNs with observations into equivalent BNL programs. A BN is 
a probabilistic graphical model that is given by a directed acyclic graph. Every 
node is a random variable and a directed edge between two nodes expresses a 
probabilistic dependency between these nodes. 

As a running example, consider the BN depicted in Fig. 3 (inspired by [31]) 
that models the mood of students after taking an exam. The network contains 
four random variables. They represent the difficulty of the exam (D), the level 
of preparation of a student (P), the achieved grade (G), and the resulting mood 
(M). For simplicity, let us assume that each random variable assumes either 0 
or 1. The edges express that the student’s mood depends on the achieved grade 
which, in turn, depends on the difficulty of the exam and the preparation of 
the student. Every node is accompanied by a table that provides the conditional 
probabilities of a node given the values of all the nodes it depends upon. We 
can then use the BN to answer queries such as “What is the probability that a 
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D=0|\D=1 . P=O|\P =i 
Preparation 
0.6 0.4 0.7 0.3 
GH=0\G=1 
D=0, P = 0| 0.95 | 0.05 
D= P= 0.05 | 0.95 
D=0, P = 0.5 0.5 
D=1,P=0) 0.6 0.4 
M=0|\M=1 


G=0| 0.9 0:1 
G=1| 0.3 0.7 


Fig. 3. A Bayesian network 


student is well-prepared for an exam (P = 1), but ends up with a bad mood 
(M = 0)?” 

In order to translate BNs into equivalent BNL programs, we need a formal 
representation first. Technically, we consider extended BNs in which nodes may 
additionally depend on inputs that are not represented by nodes in the net- 
work. This allows us to define a compositional translation without modifying 
conditional probability tables. 

Towards a formal definition of extended BNs, we use the following notation. 
A tuple (s1,...,8%) E€ SF of length k over some set S is denoted by s. The 
empty tuple is e. Moreover, for 1 < i < k, the i-th element of tuple s is given by 
s(z). To simplify the presentation, we assume that all nodes and all inputs are 
represented by natural numbers. 


Definition 7. An extended Bayesian network, EBN for short, is a tuple B = 
(V, I, E, Vals, dep, cpt), where 


- V CN and I CN are finite disjoint sets of nodes and inputs. 

- ECV xV is a set of edges such that (V, E) is a directed acyclic graph. 

— Vals is a finite set of possible values that can be assigned to each node. 

- dep: V — (VUI)* ts a function assigning each node v to an ordered sequence 
of dependencies. That is, dep(v) = (u1,..., Um) such that ui < wi4i1 (1 < 
i < m). Moreover, every dependency uj (1 < j < m) is either an input, i.e. 
uj € I, or a node with an edge to v, i.e. uj E V and (uj, v) € E. 

— cpt is a function mapping each node v to its conditional probability table 
cpt[v]. That is, for k = |dep(v)|, cpt[v] is given by a function of the form 


cpt[v] : Vals* — Vals —> [0,1] such that DD cpt[v](z)(a) = 1. 


z€ Vals" ,a€ Vals 


Here, the i-th entry in a tuple z € Val corresponds to the value assigned to 
the i-th entry in the sequence of dependencies dep(v). 
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A Bayesian network (BN) is an extended BN without inputs, i.e. I = Q. In 
particular, the dependency function is of the form dep: V > V*. 


Example 6. The formalization of our example BN (Fig. 3) is straightforward. 
For instance, the dependencies of variable G are given by dep(G) = (D, P) 
(assuming D is encoded by an integer less than P). Furthermore, every entry 
in the conditional probability table of node G corresponds to an evaluation 
of the function cpt[G]. For example, if D = 1, P = 0, and G = 1, we have 
cpt[G](1,0)(1) = 0.4. A 


In general, the conditional probability table cpt determines the conditional prob- 
ability distribution of each node v € V given the nodes and inputs it depends 
on. Formally, we interpret an entry in a conditional probability table as follows: 


Pr (v = a|dep(v) =z) = cpt[v](z)(a), 


where v € V is a node, a € Vals is a value, and z is a tuple of values of length 
|dep(v)|. Then, by the chain rule, the joint probability of a BN is given by the 
product of its conditional probability tables (cf. [4]). 


Definition 8. Let B = (V, I, E, Vals, dep, cpt) be an extended Bayesian network. 
Moreover, let W C V be a downward closed’ set of nodes. With each w € WUT, 
we associate a fired value w € Vals. This notation is lifted pointwise to tuples of 
nodes and inputs. Then the joint probability in which nodes in W assume values 
W is given by 

Pr(W=W) = [I Pr (v = v|dep(v) = depto) )) = TI cptlel(dep(v)) (x). 

vEew vew 

The conditional joint probability distribution of a set of nodes W, given obser- 
vations on a set of nodes O, is then given by the quotient P(W=W)/pr(o=0). 


For example, the probability of a student having a bad mood, i.e. M = 0, after 
getting a bad grade (G = 0) for an easy exam (D = 0) given that she was 
well-prepared, i.e. P = 1, is 


Pr(D =0,G =0,M =0,P=1) 
Pr(P =1) 
0.9 - 0.5 - 0.6 - 0.3 


0.3 


Pr(D=0,G=0,M =0| P=1) 


5.3 From Bayesian Networks to BNL 


We now develop a compositional translation from EBNs into BNL programs. 
Throughout this section, let B = (V, I, Æ, Vals, dep, cpt) be a fixed EBN. More- 
over, with every node or input v € V U I we associate a program variable £y. 

We proceed in three steps: First, every node together with its dependencies 
is translated into a block of a BNL program. These blocks are then composed 
into a single BNL program that captures the whole BN. Finally, we implement 
conditioning by means of rejection sampling. 


5 W is downward closed if v € W and (u,v) € E implies u € E. 
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Step 1: We first present the atomic building blocks of our translation. Let v € V 
be a node. Moreover, let z € Vals!*?“)! be an evaluation of the dependencies of 
v. That is, z is a tuple that associates a value with every node and input that 
v depends on (in the same order as dep(v)). For every node v and evaluation of 
its dependencies z, we define a corresponding guard and a random assignment: 


guardg(v,z) = VAN Fdep(v)(s) = Z(t) 
1<i<|dep(v)| 


assigng(v,Z) = Ly i& 5 cpt[v](z)(a) - (a) 
a€Vals 

Note that dep(v)(i) is the i-th element from the sequence of nodes dep(v). 
Example 7. Continuing our previous example (see Fig. 1), assume we fixed the 
node v = G. Moreover, let z = (1,0) be an evaluation of dep(v) = (S, R). Then 
the guard and assignment corresponding to v and z are given by: 

guardg(G,(1,0)) = zp =1 Auvp=0, and 

assigng(G,(1,0)) = za :~ 0.6 - (0) + 0.4- (1). A 
We then translate every node v € V into a program block that uses guards 
to determine the rows in the conditional probability table under consideration. 
After that, the program samples from the resulting probability distribution using 
the previously constructed assignments. In case a node does neither depend on 
other nodes nor input variables we omit the guards. Formally, 


assigng(v, €) if |dep(v)| = 0 
if (quardg(v,z1)) { 


assigng(v,21)} 


blockg(v) = else {if (guardg(v,z2)){ if |dep(v)| =k > 0 
assigng(v, Z2)t and Vals? = {z1,...,Zm}. 
..} else { 


assigng(U,Zm)}...} 


Remark 1. The guards under consideration are conjunctions of equalities 
between variables and literals. We could thus use a more efficient translation 
of conditional probability tables by adding a switch-case statement to our 
probabilistic programming language. Such a statement is of the form 


switch(x) { case aj : Cı case aj: C2 ... default: Cm}, 


where x is a tuple of variables, and a1, ...am-—1 are tuples of rational numbers of 
the same length as x. With respect to the wp semantics, a switch-case state- 
ment is syntactic sugar for nested if-then-else blocks as used in the above 
translation. However, the runtime model of a switch-case statement requires 
just a single guard evaluation (vy) instead of potentially multiple guard evalu- 
ations when evaluating nested if-then-else blocks. Since the above adaption 
is straightforward, we opted to use nested if-then-else blocks to keep our 
programming language simple and allow, in principle, more general guards. A 
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Step 2: The next step is to translate a complete EBN into a BNL program. To 
this end, we compose the blocks obtained from each node starting at the roots 
of the network. That is, all nodes that contain no incoming edges. Formally, 


roots(B) = {v € Vg | =u € Vg: (u,v) € Eg}. 


After translating every node in the network, we remove them from the graph, 
i.e. every root becomes an input, and proceed with the translation until all nodes 
have been removed. More precisely, given a set of nodes S C V, the extended 
BN 5 \ § obtained by removing S from B is defined as 


B\S = (V\S, IUS, E\(V x SUS xV), dep, cpt). 


With these auxiliary definitions readily available, an extended BN £ is translated 
into a BNL program as follows: 


blockg(r1);...; blockg(t'm) if roots(B) = {r1,..., f'm} = V 
BNL(B) = blockg(r1);...; blockg(rm); if roots(B) = {r1,..., Tm} S V 
BNL(B \ roots(B)) 


Step 3: To complete the translation, it remains to account for observations. Let 
cond: V — Vals U {L} be a function mapping every node either to an observed 
value in Vals or to L. The former case is interpreted as an observation that node 
v has value cond(v). Otherwise, i.e. if cond(v) = L, the value of node v is not 
observed. We collect all observed nodes in the set O = {v € V | cond(v) # L}. 
It is then natural to incorporate conditioning into our translation by applying 
rejection sampling: We repeatedly execute a BNL program until every observed 
node has the desired value cond(v). In the presence of observations, we translate 
the extended BN B into a BNL program as follows: 


BNL(B, cond) = repeat {BNL(B)} until (A o= cna) 
vEO 


Example 8. Consider, again, the BN B depicted in Fig. 3. Moreover, assume we 
observe P = 1. Hence, the conditioning function cond is given by cond(P) = 1 
and cond(v) = L for v € {D, G, M}. Then the translation of B and cond, i.e. 
BNL(B, cond), is the BNL program Cmooa depicted in Fig. 4. A 


Since our translation yields a BNL program for any given BN, we can composi- 
tionally compute a closed form for the expected simulation time of a BN. This 
is an immediate consequence of Corollary 1. 

We still have to prove, however, that our translation is sound, i.e. the con- 
ditional joint probabilities inferred from a BN coincide with the (conditional) 
joint probabilities from the corresponding BNL program. Formally, we obtain 
the following soundness result. 
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1 repeat { 10 } else { 
2 xp :* 0.6 - (0) + 0.4: (1); 11 za :~ 0.6- (0) + 0.4- (1) 
3 wp: 0.7 - (0) +0.3- (1) 12 }; 
4 if (tp =OA up =0){ 13 if (xa = 0){ 
5 za :~ 0.95- (0) +0.05: (1) 14 zm :© 0.9: (0) + 0.1- (1) 
6 } else if (zp =1Arp=1){ 15 } else { 
7 aq: 0.05 - (0) +0.95- (1) 16 zm (= 0.3 - (0) + 0.7 - (1) 
8 } else if (tp =OA rp =1){ 17 } 
9 za :7 0.5 - (0) +0.5- (1) 18 } until(xp = 1) 


Fig. 4. The BNL program Cmooa obtained from the BN in Fig. 3. 


Theorem 6 (Soundness of Translation). Let B = (V,I, E, Vals,dep, cpt) be 
a BN and cond: V — Vals U {L} be a function determining the observed nodes. 
For each node and input v, let v € Vals be a fixed value associated with v. In 
particular, we set v = cond(v) for each observed node v € O. Then 


Pr = 
vEV\O ocol 7g 


Proof. Without conditioning, i.e. O = 0, the proof proceeds by induction on the 
number of nodes of 5. With conditioning, we additionally apply Theorems 3 and 5 
to deal with loops introduced by observed nodes. See [3, Appendix A.7]. 


Example 9 (Expected Sampling Time of a BN). Consider, again, the BN B in 
Fig. 3. Moreover, recall the corresponding program Cmooa derived from B in 
Fig. 4, where we observed P = 1. By Theorem6 we can also determine the 
probability that a student who got a bad grade in an easy exam was well- 
prepared by means of weakest precondition reasoning. This yields 


wp [Cmooa] ([£ D = OA zG = OATM — 0]) 


_ Pr(D=0,G=0,M=0,P=1) _ 
= PrP a1) = 0.27. 


Furthermore, by Corollary 1, it is straightforward to determine the expected time 
to obtain a single sample of B that satisfies the observation P = 1: 


1+ ert [Cicop-bodyl (0) 
WP [Croop-bodyl] ([P = 1]) 


ert [Cmooa] (0) = = 23.44 Vis = 23.46. A 


6 Implementation 


We implemented a prototype in Java to analyze expected sampling times of 
Bayesian networks. More concretely, our tool takes as input a BN together with 
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observations in the popular Bayesian Network Interchange Format. The BN is 
then translated into a BNL program as shown in Sect.5. Our tool applies the 
ert-calculus together with our proof rules developed in Sect. 4 to compute the 
exact expected runtime of the BNL program. 

The size of the resulting BNL program is linear in the total number of rows 
of all conditional probability tables in the BN. The program size is thus not the 
bottleneck of our analysis. As we are dealing with an NP-hard problem [12,13], it 
is not surprising that our algorithm has a worst-case exponential time complexity. 
However, also the space complexity of our algorithm is exponential in the worst 
case: As an expectation is propagated backwards through an if—clause of the BNL 
program, the size of the expectation is potentially multiplied. This is also the reason 
that our analysis runs out of memory on some benchmarks. 

We evaluated our implementation on the largest BNs in the Bayesian Net- 
work Repository [46] that consists—to a large extent—of real-world BNs includ- 
ing expert systems for, e.g., electromyography (munin) [2], hematopathology 
diagnosis (hepar2) [42], weather forecasting (hailfinder) [1], and printer trou- 
bleshooting in Windows 95 (win95pts) [45, Sect. 5.6.2]. For a evaluation of all 
BNs in the repository, we refer to the extended version of this paper [3, Sect. 6]. 

All experiments were performed on an HP BL685C G7. Although up to 48 
cores with 2.0GHz were available, only one core was used apart from Java’s 
garbage collection. The Java virtual machine was limited to 8GB of RAM. 

Our experimental results are shown in Table 3. The number of nodes of the 
considered BNs ranges from 56 to 1041. For each Bayesian network, we com- 
puted the expected sampling time (EST) for different collections of observed 
nodes (#obs). Furthermore, Table3 provides the average Markov Blanket size, 
i.e. the average number of parents, children and children’s parents of nodes in 
the BN [43], as an indicator measuring how independent nodes in the BN are. 

Observations were picked at random. Note that the time required by our 
prototype varies depending on both the number of observed nodes and the actual 
observations. Thus, there are cases in which we run out of memory although the 
total number of observations is small. 

In order to obtain an understanding of what the EST corresponds to in 
actual execution times on a real machine, we also performed simulations for 
the win95pts network. More precisely, we generated Java programs from this 
network analogously to the translation in Sect. 5. This allowed us to approximate 
that our Java setup can execute 9.714 - 10° steps (in terms of EST) per second. 

For the win95pts with 17 observations, an EST of 1.11-10!° then corresponds 
to an expected time of approximately 3.6 years in order to obtain a single valid 
sample. We were additionally able to find a case with 13 observed nodes where 
our tool discovered within 0.32s an EST that corresponds to approximately 4.3 
million years. In contrast, exact inference using variable elimination was almost 
instantaneous. This demonstrates that knowing expected sampling times upfront 
can indeed be beneficial when selecting an inference method. 


ê http://www.cs.cmu.edu/~fgcozman/Research /InterchangeFormat/. 
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Table 3. Experimental results. Time is in seconds. MO denotes out of memory. 


BN #tobs Time EST #tobs Time EST #tobs Time EST 
hailfinder #nodes: 56, #edges: 66, avg. Markov Blanket: 3.54 
0 0.23 9.500- 10'|5 0.63 5.016-10 |9 0.46 9.048 - 10° 
hepar2 #nodes: 70, #edges: 123, avg. Markov Blanket: 4.51 
0 0.22 1.310-107|1 1.84 1.579- 10°? |2 MO - 
win95pts #nodes: 76, #edges: 112, avg. Markov Blanket: 5.92 
0 0.20 1.180-107|1 0.36 2.284- 10° /3 0.36 4.296 - 10° 
7 0.91 1.876-10°|12 0.42 3.973-10" |17 61.73 1.110-10” 
pathfinder #nodes: 195, #edges: 200, avg. Markov Blanket: 3.04 
0 0.37 217 1 0.53 1.050- 104 /3 31.31 2.872-10+ 
5 MO - 7 5.44 00 7 480.83 co 
andes #nodes: 223, #edges: 338, avg. Markov Blanket: 5.61 
0 0.46 3.570-107|1 MO - 3 1.66 5.251- 10° 
5 1.41 9.862-107/7 0.99 8.904- 104 |9 0.90 6.637- 10° 
pigs #nodes: 441, #edges: 592, avg. Markov Blanket: 3.66 
0 0.57 7.370-107|1 0.74 2.952- 10° /3 0.88 2.362- 10° 
5 0.85 1.260- 10°|7 1.02 1.511- 10° |8 MO - 
munin #nodes: 1041, #edges: 1997, avg. Markov Blanket: 3.54 
0 1.29 1.823- 10°|1 1.47 3.648- 10° |3 1.37 1.824- 107 
5 1.43 œ 9 1.79 1.824. 10!6|10 65.64 1.153- 107 


7 Conclusion 


We presented a syntactic notion of independent and identically distributed prob- 
abilistic loops and derived dedicated proof rules to determine exact expected out- 
comes and runtimes of such loops. These rules do not require any user-supplied 
information, such as invariants, (super)martingales, etc. 

Moreover, we isolated a syntactic fragment of probabilistic programs that 
allows to compute expected runtimes in a highly automatable fashion. This frag- 
ment is non-trivial: We show that all Bayesian networks can be translated into 
programs within this fragment. Hence, we obtain an automated formal method 
for computing expected simulation times of Bayesian networks. We implemented 
this method and successfully applied it to various real-world BNs that stem 
from, amongst others, medical applications. Remarkably, our tool was capable 
of proving extremely large expected sampling times within seconds. 

There are several directions for future work: For example, there exist sub- 
classes of BNs for which exact inference is in P, e.g. polytrees. Are there analogies 
for probabilistic programs? Moreover, it would be interesting to consider more 
complex graphical models, such as recursive BNs [16]. 
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Abstract. We extend the simply-typed guarded A-calculus with discrete 
probabilities and endow it with a program logic for reasoning about rela- 
tional properties of guarded probabilistic computations. This provides 
a framework for programming and reasoning about infinite stochastic 
processes like Markov chains. We demonstrate the logic sound by inter- 
preting its judgements in the topos of trees and by using probabilistic 
couplings for the semantics of relational assertions over distributions on 
discrete types. 

The program logic is designed to support syntax-directed proofs in 
the style of relational refinement types, but retains the expressiveness of 
higher-order logic extended with discrete distributions, and the ability 
to reason relationally about expressions that have different types or syn- 
tactic structure. In addition, our proof system leverages a well-known 
theorem from the coupling literature to justify better proof rules for 
relational reasoning about probabilistic expressions. We illustrate these 
benefits with a broad range of examples that were beyond the scope of 
previous systems, including shift couplings and lump couplings between 
random walks. 


1 Introduction 


Stochastic processes are often used in mathematics, physics, biology or finance 
to model evolution of systems with uncertainty. In particular, Markov chains 
are “memoryless” stochastic processes, in the sense that the evolution of the 
system depends only on the current state and not on its history. Perhaps the 
most emblematic example of a (discrete time) Markov chain is the simple random 
walk over the integers, that starts at 0, and that on each step moves one position 
either left or right with uniform probability. Let p; be the position at time i. 
Then, this Markov chain can be described as: 


=0 J pi +1 with probability 1/2 
un Pitt ™ Ì p; — 1 with probability 1/2 
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The goal of this paper is to develop a programming and reasoning frame- 
work for probabilistic computations over infinite objects, such as Markov chains. 
Although programming and reasoning frameworks for infinite objects and proba- 
bilistic computations are well-understood in isolation, their combination is chal- 
lenging. In particular, one must develop a proof system that is powerful enough for 
proving interesting properties of probabilistic computations over infinite objects, 
and practical enough to support effective verification of these properties. 


Modelling Probabilistic Infinite Objects. A first challenge is to model probabilistic 
infinite objects. We focus on the case of Markov chains, due to its importance. A 
(discrete-time) Markov chain is a sequence of random variables {X;} over some 
fixed type T satisfying some independence property. Thus, the straightforward 
way of modelling a Markov chain is as a stream of distributions over T. Going 
back to the simple example outlined above, it is natural to think about this 
kind of discrete-time Markov chain as characterized by the sequence of positions 
{pi}ien, which in turn can be described as an infinite set indexed by the natural 
numbers. This suggests that a natural way to model such a Markov chain is to 
use streams in which each element is produced probabilistically from the previous 
one. However, there are some downsides to this representation. First of all, it 
requires explicit reasoning about probabilistic dependency, since X;+ı depends 
on X;. Also, we might be interested in global properties of the executions of 
the Markov chain, such as “The probability of passing through the initial state 
infinitely many times is 1”. These properties are naturally expressed as properties 
of the whole stream. For these reasons, we want to represent Markov chains as 
distributions over streams. Seemingly, one downside of this representation is that 
the set of streams is not countable, which suggests the need for introducing heavy 
measure-theoretic machinery in the semantics of the programming language, 
even when the underlying type is discrete or finite. 

Fortunately, measure-theoretic machinery can be avoided (for discrete dis- 
tributions) by developing a probabilistic extension of the simply-typed guarded 
A-calculus and giving a semantic interpretation in the topos of trees [1]. Infor- 
mally, the simply-typed guarded A-calculus [1] extends the simply-typed lambda 
calculus with a later modality, denoted by >. The type >A ascribes expressions 
that are available one unit of logical time in the future. The > modality allows 
one to model infinite types by using “finite” approximations. For example, a 
stream of natural numbers is represented by the sequence of its (increasing) pre- 
fixes in the topos of trees. The prefix containing the first i elements has the type 
Si =NxpNx...xpCON, representing that the first element is available now, 
the second element a unit time in the future, and so on. This is the key to repre- 
senting probability distributions over infinite objects without measure-theoretic 
semantics: We model probability distributions over non-discrete sets as discrete 
distributions over their (the sets’) approximations. For example, a distribution 
over streams of natural numbers (which a priori would be non-discrete since the 
set of streams is uncountable) would be modelled by a sequence of distributions 
over the finite approximations S1, 52,... of streams. Importantly, since each S; 
is countable, each of these distributions can be discrete. 
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Reasoning About Probabilistic Computations. Probabilistic computations exhibit 
a rich set of properties. One natural class of properties is related to probabilities 
of events, saying, for instance, that the probability of some event Æ (or of an 
indexed family of events) increases at every iteration. However, several inter- 
esting properties of probabilistic computation, such as stochastic dominance or 
convergence (defined below) are relational, in the sense that they refer to two 
runs of two processes. In principle, both classes of properties can be proved 
using a higher-order logic for probabilistic expressions, e.g. the internal logic of 
the topos of trees, suitably extended with an axiomatization of finite distribu- 
tions. However, we contend that an alternative approach inspired from refine- 
ment types is desirable and provides better support for effective verification. 
More specifically, reasoning in a higher-order logic, e.g. in the internal logic of 
the topos of trees, does not exploit the structure of programs for non-relational 
reasoning, nor the structural similarities between programs for relational rea- 
soning. As a consequence, reasoning is more involved. To address this issue, we 
define a relational proof system that exploits the structure of the expressions 
and supports syntax-directed proofs, with necessary provisions for escaping the 
syntax-directed discipline when the expressions do not have the same structure. 
The proof system manipulates judgements of the form: 


A|Z|0C|Wtt,:Ap~te: Ao] ¢ 


where A and I are two typing contexts, X and W respectively denote sets of 
assertions over variables in these two contexts, tı and t2 are well-typed expres- 
sions of type A; and Ag, and ¢ is an assertion that may contain the special 
variables rı and r2 that respectively correspond to the values of tı and t2. The 
context A and I’, the terms tı and tz and the types A, and Ag provide a specifi- 
cation, while X, W, and ¢ are useful for reasoning about relational properties over 
t,,t2, their inputs and their outputs. This form of judgement is similar to that 
of Relational Higher-Order Logic [2], from which our system draws inspiration. 

In more detail, our relational logic comes with typing rules that allow one to 
reason about relational properties by exploiting as much as possible the syntactic 
similarities between tı and to, and to fall back on pure logical reasoning when 
these are not available. In order to apply relational reasoning to guarded compu- 
tations the logic provides relational rules for the later modality > and for a related 
modality O, called “constant”. These rules allow the relational verification of 
general relational properties that go beyond the traditional notion of program 
equivalence and, moreover, they allow the verification of properties of guarded 
computations over different types. The ability to reason about computations 
of different types provides significant benefits over alternative formalisms for 
relational reasoning. For example, it enables reasoning about relations between 
programs working on different data structures, e.g. a relation between a program 
working on a stream of natural numbers, and a program working on a stream of 
pairs of natural numbers, or having different structures, e.g. a relation between 
an application and a case expression. 

Importantly, our approach for reasoning formally about probabilistic com- 
putations is based on probabilistic couplings, a standard tool from the analysis 
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of Markov chains [3,4]. From a verification perspective, probabilistic couplings 
go beyond equivalence properties of probabilistic programs, which have been 
studied extensively in the verification literature, and yet support compositional 
reasoning [5,6]. The main attractive feature of coupling-based reasoning is that it 
limits the need of explicitly reasoning about the probabilities—this avoids com- 
plex verification conditions. We provide sound proof rules for reasoning about 
probabilistic couplings. Our rules make several improvements over prior rela- 
tional verification logics based on couplings. First, we support reasoning over 
probabilistic processes of different types. Second, we use Strassen’s theorem [7] 
a remarkable result about probabilistic couplings, to achieve greater expressivity. 
Previous systems required to prove a bijection between the sampling spaces to 
show the existence of a coupling [5,6], Strassen’s theorem gives a way to show 
their existence which is applicable in settings where the bijection-based approach 
cannot be applied. And third, we support reasoning with what are called shift 
couplings, coupling which permits to relate the states of two Markov chains at 
possibly different times (more explanations below). 


Case Studies. We show the flexibility of our formalism by verifying several exam- 
ples of relational properties of probabilistic computations, and Markov chains in 
particular. These examples cannot be verified with existing approaches. 

First, we verify a classic example of probabilistic non-interference which 
requires the reasoning about computations at different types. Second, in the con- 
text of Markov chains, we verify an example about stochastic dominance which 
exercises our more general rule for proving the existence of couplings modelled by 
expressions of different types. Finally, we verify an example involving shift rela- 
tions in an infinite computation. This style of reasoning is motivated by “shift” 
couplings in Markov chains. In contrast to a standard coupling, which relates the 
states of two Markov chains at the same time t, a shift coupling relates the states 
of two Markov chains at possibly different times. Our specific example relates a 
standard random walk (described earlier) to a variant called a lazy random walk; 
the verification requires relating the state of standard random walk at time t to 
the state of the lazy random walk at time 2t. We note that this kind of reasoning 
is impossible with conventional relational proof rules even in a non-probabilistic 
setting. Therefore, we provide a novel family of proof rules for reasoning about 
shift relations. At a high level, the rules combine a careful treatment of the later 
and constant modalities with a refined treatment of fixpoint operators, allowing 
us to relate different iterates of function bodies. 


Summary of Contributions 


With the aim of providing a general framework for programming and reasoning 
about Markov chains, the three main contributions of this work are: 


1. A probabilistic extension of the guarded -calculus, that enables the definition 
of Markov chains as discrete probability distributions over streams. 

2. A relational logic based on coupling to reason in a syntax-directed manner 
about (relational) properties of Markov chains. This logic supports reasoning 
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about programs that have different types and structures. Additionally, this 
logic uses results from the coupling literature to achieve greater expressivity 
than previous systems. 

3. An extension of the relational logic that allows to relate the states of two 
streams at possibly different times. This extension supports reasoning prin- 
ciples, such as shift couplings, that escape conventional relational logics. 


Omitted technical details can be found in the full version of the paper with 
appendix at https://arxiv.org/abs/1802.09787. 


2 Mathematical Preliminaries 


This section reviews the definition of discrete probability sub-distributions and 
introduces mathematical couplings. 


Definition 1 (Discrete probability distribution). Let C be a discrete (i.e., 
finite or countable) set. A (total) distribution over C is a function u : C — [0,1] 
such that Yo „cec u(x) = 1. The support of a distribution p is the set of points 
with non-zero probability, supp u = {x € C | u(x) > 0}. We denote the set of 
distributions over C as D(C). Given a subset E C C, the probability of sampling 
from u a point in E is denoted Prz—,,| € E], and is equal to X` „<p H(z). 


Definition 2 (Marginals). Let be a distribution over a product space C1 x 
C2. The first (second) marginal of u is another distribution D(m1) (u) (D(72)(t)) 
over Cı (C2) defined as: 


D(m)(u)(x) = XC uzy) (aww => ne) 


xECı 


Probabilistic Couplings. Probabilistic couplings are a fundamental tool in the 
analysis of Markov chains. When analyzing a relation between two probability 
distributions it is sometimes useful to consider instead a distribution over the 
product space that somehow “couples” the randomness in a convenient manner. 

Consider for instance the case of the following Markov chain, which counts 
the total amount of tails observed when tossing repeatedly a biased coin with 
probability of tails p: 


ni +1 with probability p 


If we have two biased coins with probabilities of tails p and q with p < q and 
we respectively observe {n;} and {m;} we would expect that, in some sense, 
ni < m; should hold for all i (this property is known as stochastic dominance). 
A formal proof of this fact using elementary tools from probability theory would 
require to compute the cumulative distribution functions for n; and m; and then 
to compare them. The coupling method reduces this proof to showing a way to 
pair the coin flips so that if the first coin shows tails, so does the second coin. 
We now review the definition of couplings and state relevant properties. 
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Definition 3 (Couplings). Let pı € D(C1) and u2 E€ D(C2), and RC C1 x C2. 


— A distribution u € D(C, x C2) is a coupling for pı and uo iff its first and 
second marginals coincide with pı and uo respectively, i.e. D(m1)(w) = pı and 
D(72)(u) = p2. 

- A distribution u € D(C, x C2) is a R-coupling for pı and u2 if it is a coupling 
for pı and u2 and, moreover, Pr(a,,25)—p[R £1 £2] = 1, i.e., if the support of 
the distribution u is included in R. 


Moreover, we write On, yo-R iff there exists a R-coupling for yı and po. 


Couplings always exist. For instance, the product distribution of two distribu- 
tions is always a coupling. Going back to the example about the two coins, it 
can be proven by computation that the following is a coupling that lifts the 
less-or-equal relation (0 indicating heads and 1 indicating tails): 


{ (0,0) w/ prob (1—g) (0,1) w/ prob (q — p) 
(1,0) w/ prob 0 (1,1) w/ prob p 


The following theorem in [7] gives a necessary and sufficient condition for the 
existence of R-couplings between two distributions. The theorem is remarkable in 
the sense that it proves an equivalence between an existential property (namely 
the existence of a particular coupling) and a universal property (checking, for 
each event, an inequality between probabilities). 


Theorem 1 (Strassen’s theorem). Consider pı E€ D(C1) and uo E€ D(C), 
and R C Cı x C2. Then opua R iff for every X C Cy, Pra, —p,[t1 E€ X] < 
Pres—us[£2 E R(X)], where R(X) is the image of X under R, i.e. R(X) = {y € 
Ca |IrE X. Rzy}. 


An important property of couplings is closure under sequential composition. 


Lemma 1 (Sequential composition couplings). Let mı € D(C1), we € 
D(C2), Mı : Cı —> D(D,) and Mz : Ca — D(D2). Moreover, let R C Cy x Co 
and S C Dı x Dg. Assume: (1) oy, u2-R; and (2) for every zı € Cy and x2 € C2 
such that R x1 x2, we have om, (z1),Mə(æ2) 9- Then (bind pı M,),(bind p2 M2); 
where bind u M is defined as 


(bind p M)(y) = X` (a) - M(x)(y) 


We conclude this section with the following lemma, which follows from Strassen’s 
theorem: 


Lemma 2 (Fundamental lemma of couplings). Let R C C1 x C2, Ey CC; 
and E> C C such that for every xı E€ Ey and zə € Co, Rx, £2 implies xq € Eo, 
ie. R(E1) C E2. Moreover, let pı € D(C) and u2 € D(C2) such that on, oR. 
Then 


Pr [xy E Ei] < Pr [xo E Eo] 
L2— H2 


Tı Hı 
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This lemma can be used to prove probabilistic inequalities from the existence of 
suitable couplings: 


Corollary 1. Let u, u2 € D(C): 


1. If On, y-(=), then for all x € C, m(x) = po(z). 
2. If C=N and oy, o-(>), then for alln € N, Prec [£ > n] > Precu le > n] 


In the example at the beginning of the section, the property we want to prove 
is precisely that, for every k and 7, the following holds: 
Pr [ay > k| < Pr [a> k] 
Lyn; g2 Mi 


Since we have a <-coupling, this proof is immediate. This example is formalized 
in Subsect. 3.3. 


3 Overview of the System 


In this section we give a high-level overview of our system, with the details on 
Sects. 4, 5 and 6. We start by presenting the base logic, and then we show how 
to extend it with probabilities and how to build a relational reasoning system 
on top of it. 


3.1 Base Logic: Guarded Higher-Order Logic 


Our starting point is the Guarded Higher-Order Logic [1] (Guarded HOL) 
inspired by the topos of trees. In addition to the usual constructs of HOL to 
reason about lambda terms, this logic features the > and O modalities to reason 
about infinite terms, in particular streams. The > modality is used to reason 
about objects that will be available in the future, such as tails of streams. For 
instance, suppose we want to define an All(s,¢) predicate, expressing that all 
elements of a stream s = n::xs satisfy a property ¢. This can be axiomatized as 
follows: 


V(as :>Stry)(n: N). n > b[s — as]. All(s,xv.¢) > All(ni:as, x.) 


We use z.¢ to denote that the formula ¢ depends on a free variable x, which will 
get replaced by the first argument of All. We have two antecedents. The first 
one states that the head n satisfies ¢. The second one, > [s — xs]. All(s, £.ġ), 
states that all elements of xs satisfy ¢. Formally, xs is the tail of the stream and 
will be available in the future, so it has type >Stry. The delayed substitution 
>[s — as] replaces s of type Stry with zs of type > Stry inside All and shifts the 
whole formula one step into the future. In other words, > [|s — as]. All(s, x.) 
states that All(—,«.¢) will be satisfied by xs in the future, once it is available. 
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3.2 A System for Relational Reasoning 


When proving relational properties it is often convenient to build proofs guided 
by the syntactic structure of the two expressions to be related. This style of 
reasoning is particularly appealing when the two expressions have the same 
structure and control-flow, and is appealingly close to the traditional style of 
reasoning supported by refinement types. At the same time, a strict adherence to 
the syntax-directed discipline is detrimental to the expressiveness of the system; 
for instance, it makes it difficult or even impossible to reason about structurally 
dissimilar terms. To achieve the best of both worlds, we present a relational proof 
system built on top of Guarded HOL, which we call Guarded RHOL. Judgements 
have the shape: 
A|X | |WEt,:Ai~te: A| 


where ¢ is a logical formula that may contain two distinguished variables rı 
and r2 that respectively represent the expressions tı and tz. This judgement 
subsumes two typing judgements on tı and t2 and a relation @ on these two 
expressions. However, this form of judgement does not tie the logical property 
to the type of the expressions, and is key to achieving flexibility while supporting 
syntax-directed proofs whenever needed. The proof system combines rules of two 
different flavours: two-sided rules, which relate expressions with the same top- 
level constructs, and one-sided rules, which operate on a single expression. 

We then extend Guarded HOL with a modality o that lifts assertions over 
discrete types Cı and C2 to assertions over D(C) and D(C2). Concretely, we 
define for every assertion ¢, variables x; and x2 of type Cı and C2 respectively, 
and expressions tı and t2 of type D(C1) and D(C2) respectively, the modal 
assertion ©[2;—t,,22—t2)? Which holds iff the interpretations of tı and tz are 
related by the probabilistic lifting of the interpretation of ø. We call this new 
logic Probabilistic Guarded HOL. 

We accordingly extend the relational proof system to support reasoning about 
probabilistic expressions by adding judgements of the form: 


A | Dy, | r | WE ty : D(C;) N tə ; D(C) (Siira 


expressing that tı and t2 are distributions related by a ¢-coupling. We call 
this proof system Probabilistic Guarded RHOL. These judgements can be built 
by using the following rule, that lifts relational judgements over discrete types 
Cı and C2 to judgements over distribution types D(C1) and D(C2) when the 
premises of Strassen’s theorem are satisfied. 


A | X | T | WEVX,C Cy. Pry, t [yi € Xıl < Pry, ts 3Y € X1.¢] 
A|X| LP |Y EF t : D(C1) ~ ta: D(C2) | Oy, 41 ,yor 2] 


COUPLING 


Recall that (discrete time) Markov chains are “memoryless” probabilistic 
processes, whose specification is given by a (discrete) set C of states, an initial 
state so and a probabilistic transition function step : C — D(C), where D(S) 
represents the set of discrete distributions over C. As explained in the intro- 
duction, a convenient modelling of Markov chains is by means of probabilistic 
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streams, i.e. to model a Markov chain as an element of D(Strg), where S is its 
underlying state space. To model Markov chains, we introduce a markov oper- 
ator with type C — (C — D(C)) — D(Strc) that, given an initial state and a 
transition function, returns a Markov chain. We can reason about Markov chains 
by the [Markov] rule (the context, omitted, does not change): 


F ti : Cy ~tg:Co|¢ 
H hı : Cı = D(C;) ~ ho : C2 =y D(C) | W3 
F a 
H markov(t, hi) : D(Strp,) ~ markov(t2, h2) : D(Strp,) | oar d 


Y2*-T2 


7 Markov 


p3 = V01£2.0[21/11][v2/r2] = fyri v1,yo—re ao] PlY1/¥1] [y2/T2] 
where ¢ Y4 =V21 £2 £S1 £S2.Q|£1/r1]|£2/r2] > > [yr — £81, yo — £82] Q" > 


@ [ay ::081 /y1|[©o2:S2/ yal 


Informally, the rule stipulates the existence of an invariant ¢ over states. The 
first premise insists that the invariant hold on the initial states, the condition 
ws states that the transition functions preserve the invariant, and %4 states that 
the invariant ¢ over pairs of states can be lifted to a stream property ¢’. 

Other rules of the logic are given in Fig. 1. The language construct munit 
creates a point distribution whose entire mass is at its argument. Accordingly, 
the [UNIT] rule creates a straightforward coupling. The [MLET] rule internalizes 
sequential composition of couplings (Lemma 1) into the proof system. The con- 
struct let x = t in t composes a distribution t with a probabilistic computation 
t with one free variable x by sampling x from t and running t. The [MLET-L] 
rule supports one-sided reasoning about let x = t in t and relies on the fact 
that couplings are closed under convex combinations. Note that one premise of 
the rule uses a unary judgement, with a non-relational modality o;;,)}¢ whose 
informal meaning is that ø holds with probability 1 in the distribution r. 

The following table summarizes the different base logics we consider, the 
relational systems we build on top of them, including the ones presented in [2], 
and the equivalences between both sides: 


Relational logic Base logic 

RHOL [2] [2] HOL [2] 

r\Wrint|¢ <7 P| Wk ti/ril[te/ro] 
Guarded RHOL §6 Thm3 Guarded HOL [1] 
A|E|P|Vri~ta|¢ “FALE P| Gr olts/rijlt2/r2 
Probabilistic Guarded RHOL 86 Thm3 Probabilistic Guarded HOL 85 
A|X|P|Wrtis te | Sly ri ,y2—re] P AZ| LP] Wr fyi ctr yote]-P 


3.3 Examples 


We formalize elementary examples from the literature on security and Markov 
chains. None of these examples can be verified in prior systems. Uniformity of 
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A | DD) | r | WE ty : Cı ~ to > C2 | Q[r1/x1,r2/22| 
A|X|L| WF munit(t:) : D(C1) ~ munit(t2) : D(C2) | 2, ery s2 ere] P 
A | Ey | 7 | Wet: D(C) ~t: D(C2) | Oiri ries r] 
A| X |T, xı: Ci, £2: C2 | P,Q F ti : D(D1) ~ tz : D(D2) | oly Gry yo cra] 
A | SIr | We let xı = tı in ti : D(Dı) ~ let x2 = to in ts : D(D2) | opiy 
yore 


UNIT 


MLET 


AJET [EE t : D(C) | end 
AJl | I;i: Cy | Wor ti : D(D1) om ty : D(D2) | Oly r1 y2 r2] Y MLET — L 
A | 5 | T | We let zı = tı in ti : D(D;) ~x ts : D(Dz2) | E E EE 


Fig. 1. Proof rules for probabilistic constructs 


one-time pad and lumping of random walks cannot even be stated in prior sys- 
tems because the two related expressions in these examples have different types. 
The random walk vs lazy random walk (shift coupling) cannot be proved in prior 
systems because it requires either asynchronous reasoning or code rewriting. 
Finally, the biased coin example (stochastic dominance) cannot be proved in 
prior work because it requires Strassen’s formulation of the existence of coupling 
(rather than a bijection-based formulation) or code rewriting. We give additional 
details below. 


One-Time Pad/Probabilistic Non-interference. Non-interference [8] is a 
baseline information flow policy that is often used to model confidentiality of 
computations. In its simplest form, non-interference distinguishes between public 
(or low) and private (or high) variables and expressions, and requires that the 
result of a public expression not depend on the value of its private parameters. 
This definition naturally extends to probabilistic expressions, except that in this 
case the evaluation of an expression yields a distribution rather than a value. 
There are deep connections between probabilistic non-interference and several 
notions of (information-theoretic) security from cryptography. In this paragraph, 
we illustrate different flavours of security properties for one-time pad encryption. 
Similar reasoning can be carried out for proving (passive) security of secure 
multiparty computation algorithms in the 3-party or multi-party setting [9,10]. 

One-time pad is a perfectly secure symmetric encryption scheme. Its space 
of plaintexts, ciphertexts and keys is the set {0,1}—fixed-length bitstrings of 
size £. The encryption algorithm is parametrized by a key k—sampled uniformly 
over the set of bitstrings {0,1}’—and maps every plaintext m to the ciphertext 
c= k ® m, where the operator ® denotes bitwise exclusive-or on bitstrings. We 
let otp denote the expression Am.let k = Uso; in munit(k $ m), where Ux is 
the uniform distribution over a finite set X. 

One-time pad achieves perfect security, i.e. the distributions of ciphertexts is 
independent of the plaintext. Perfect security can be captured as a probabilistic 
non-interference property: 


+ otp : {0,1}* > D({0,1}*) ~ otp : {0,1} > D({0,1}*) | Ymımz.rı mı = rə mə 
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where e} = ev is used as a shorthand for lyr —e1,yoea]Y1 = Y2. The crux of the 
proof is to establish 


Mı, Mo : {0, ii H Utoe : D({0, 1}*) ~ Utoy : D({0, 1}5) | rı m2 2 r2 Ð Mı 


using the [COUPLING] rule. It suffices to observe that the assertion induces a 
bijection, so the image of an arbitrary set X under the relation has the same 
cardinality as X, and hence their probabilities w.r.t. the uniform distributions 
are equal. One can then conclude the proof by applying the rules for monadic 
sequenciation ([MLET]) and abstraction (rule [ABS] in appendix), using algebraic 
properties of ®. 

Interestingly, one can prove a stronger property: rather than proving that the 
ciphertext is independent of the plaintext, one can prove that the distribution 
of ciphertexts is uniform. This is captured by the following judgement: 


c1, c2 : {0,1} F otp : {0,1} — D({0,1}*) ~ otp : {0,1} — D({0,1}*) |% 


where w £ Ymi m2.m, = m2 > Olyiri my2 r2 mz]Y1 = C1 ® Y2 = Cg. This 
style of modelling uniformity as a relational property is inspired from [11]. The 
proof is similar to the previous one and omitted. However, it is arguably more 
natural to model uniformity of the distribution of ciphertexts by the judgement: 


+ otp : {0,1} > D({0, 1}4) ~ Uso rye : D({0, 1}") | Ym. rı m = r2 


This judgement is closer to the simulation-based notion of security that is used 
pervasively in cryptography, and notably in Universal Composability [12]. Specif- 
ically, the statement captures the fact that the one-time pad algorithm can 
be simulated without access to the message. It is interesting to note that the 
judgement above (and more generally simulation-based security) could not be 
expressed in prior works, since the two expressions of the judgement have differ- 
ent types—note that in this specific case, the right expression is a distribution 
but in the general case the right expression will also be a function, and its domain 
will be a projection of the domain of the left expression. 
The proof proceeds as follows. First, we prove 


E Utoe ~ Utoe | YM. Sfr yra] Y1 BM = y2 
using the [COUPLING] rule. Then, we apply the [MLET] rule to obtain 


let k= Uto} in let k = Uto} in _ 
munit(k ® m) om munit(k) | [yır ;y2—r2]Y1 = Y2 


We have let k = Uto} in munit(k) = Uto,1}2; hence by equivalence (rule [Equiv] 
in appendix), this entails 


F let k = Utoy in munit(k ($>) m) ~ Utoy | Oly: —ri yore] Y1 = Y2 


We conclude by applying the one-sided rule for abstraction. 
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Stochastic Dominance. Stochastic dominance defines a partial order between 
random variables whose underlying set is itself a partial order; it has many dif- 
ferent applications in statistical biology (e.g. in the analysis of the birth-and- 
death processes), statistical physics (e.g. in percolation theory), and economics. 
First-order stochastic dominance, which we define below, is also an important 
application of probabilistic couplings. We demonstrate how to use our proof sys- 
tem for proving (first-order) stochastic dominance for a simple Markov process 
which samples biased coins. While the example is elementary, the proof method 
extends to more complex examples of stochastic dominance, and illustrates the 
benefits of Strassen’s formulation of the coupling rule over alternative formula- 
tions stipulating the existence of bijections (explained later). 

We start by recalling the definition of (first-order) stochastic dominance for 
the N-valued case. The definition extends to arbitrary partial orders. 


Definition 4 (Stochastic dominance). Let pı, 2 E€ D(N). We say that u2 
stochastically dominates u1, written pı <sp H2, iff for every n EN, 

Pr [je> n] < Pr [z> n] 

z—u zuz 
The following result, equivalent to Corollary 1, characterizes stochastic domi- 
nance using probabilistic couplings. 


Proposition 1. Let pı, u2 € D(N). Then pı <sp be iff Smua- (<). 


We now turn to the definition of the Markov chain. For p € [0, 1], we consider 
the parametric N-valued Markov chain coins  markov(0, h), with initial state 0 
and (parametric) step function: 


h = Az.let b = B(p) in munit(x + b) 


where, for p € [0,1], B(p) is the Bernoulli distribution on {0,1} with probability 
p for 1 and 1 — p for 0. Our goal is to establish that coins is monotonic, i.e. for 
every pi,p2 € [0,1], pı < p2 implies coins pı <gp coins pa. We formalize this 
statement as 


F coins : [0,1] > D(Stry) ~ coins : [0,1] — D(Stry) | 4 


where Y% £ Vp,,p9-p1 < po > Oierigacrs) All, Y2, 21.22.21 < 22). The crux 
of the proof is to establish stochastic dominance for the Bernoulli distribution: 


pı : [0,1], pə : [0,1] | pı < p2 F B(pr) : DIN) ~ B(p2) : DCN) | rı £ ro 


where we use e1 < e2 as shorthand for [y,—¢, y.—es]¥1 < Y2. This is proved 
directly by the [COUPLING] rule and checking by simple calculations that the 
premise of the rule is valid. 

We briefly explain how to conclude the proof. Let hı and hz be the step 
functions for pı and pə respectively. It is clear from the above that (context 
omitted): 


Tı < T2 R hı ey: D( B) ~ ho Ta : D( B) | Iyı r1,y2 ro] Y1 < Y2 
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and by the definition of All: 
Ly < z2 => All(wsy, v2, 21.22.21 < 22) => All(£1:: > £51, Va: > LS2, 21.22.21 < 22) 


So, we can conclude by applying the [Markov] rule. 

It is instructive to compare our proof with prior formalizations, and in par- 
ticular with the proof in [5]. Their proof is carried out in the pRHL logic, whose 
[COUPLING] rule is based on the existence of a bijection that satisfies some prop- 
erty, rather than on our formalization based on Strassen’s Theorem. Their rule 
is motivated by applications in cryptography, and works well for many examples, 
but is inconvenient for our example at hand, which involves non-uniform proba- 
bilities. Indeed, their proof is based on code rewriting, and is done in two steps. 
First, they prove equivalence between sampling and returning x; from B(p,); 
and sampling zı from B(p2), z2 from B(?'/,,) and returning z = z1 A z2. Then, 
they find a coupling between z and B(p2). 


Shift Coupling: Random Walk vs Lazy Random Walk. The previous 
example is an instance of a lockstep coupling, in that it relates the k-th element 
of the first chain with the k-th element of the second chain. Many examples from 
the literature follow this lockstep pattern; however, it is not always possible to 
establish lockstep couplings. Shift couplings are a relaxation of lockstep couplings 
where we relate elements of the first and second chains without the requirement 
that their positions coincide. 

We consider a simple example that motivates the use of shift couplings. Con- 
sider the random walk and lazy random walk (which, at each time step, either 
chooses to move or stay put), both defined as Markov chains over Z. For sim- 
plicity, assume that both walks start at position 0. It is not immediate to find a 
coupling between the two walks, since the two walks necessarily get desynchro- 
nized whenever the lazy walk stays put. Instead, the trick is to consider a lazy 
random walk that moves two steps instead of one. The random walk and the 
lazy random walk of step 2 are defined by the step functions: 


step £ da let z = Us_ 1,1} in munit(z + 2) 
Istep2 = Ax.let z = Ut—1,1} in let b =Uyo,14 in munit(x + 2 * z * b) 


After 2 iterations of step, the position has either changed two steps to the left or 
to the right, or has returned to the initial position, which is the same behaviour 
Istep2 has on every iteration. Therefore, the coupling we want to find should 
equate the elements at position 27 in step with the elements at position 7 in 
Istep2. The details on how to prove the existence of this coupling are in Sect. 6. 


Lumped Coupling: Random Walks on 3 and 4 Dimensions. A Markov 
chain is recurrent if it has probability 1 of returning to its initial state, and 
transient otherwise. It is relatively easy to show that the random walk over 
Z is recurrent. One can also show that the random walk over Z? is recurrent. 
However, the random walk over Z? is transient. 
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For higher dimensions, we can use a coupling argument to prove transience. 
Specifically, we can define a coupling between a lazy random walk in n dimensions 
and a random walk in n+ m dimensions, and derive transience of the latter from 
transience of the former. We define the (lazy) random walks below, and sketch 
the coupling arguments. 

Specifically, we show here the particular case of the transience of the 4- 
dimensional random walk from the transience of the 3-dimensional lazy random 
walk. We start by defining the stepping functions: 


step, : Z4 > D(Z*) £ Azı .let xı = Uy, in munit(z1 +4 21) 
Istep; : Z3 — D(Z?) £ Az2.let z2 = Uy, in let by = B(3/4) in munit(z2 +3 be * x2) 


where U; = {(+1,0,...0),...,(0,...,0,+1)} are the vectors of the basis of Z* 
and their opposites. Then, the random walk of dimension 4 is modelled by 
rwalk4  markov(0, step4), and the lazy walk of dimension 3 is modelled by 
lwalk3 & markov(0,step3). We want to prove: 


H rwalk4 : D(Strzs) ~ lwalk3 : D(Strzs) | Opie ry All(yi, Y2, 21-22. pr3(z1) = 22) 
2-2 
where pr}? denotes the standard projection from Z”? to Z". 
We apply the [Markov] rule. The only interesting premise requires proving 
that the transition function preserves the coupling: 
p2 = pr3(p1) H stepy ~ Isteps | Vr129.%2 = pr3(a1) > Opacz] pr3(y1) = y2 


Y2 r2 T2 
To prove this, we need to find the appropriate coupling, i.e., one that pre- 


serves the equality. The idea is that the step in Z3 must be the projection of the 
step in Z*. This corresponds to the following judgement: 


AZ. let T2 = Uv; in 
let bə = Be) in 
munit(z2 +3 bə * x2) 


Vz122. pr$(21) =2> 


pr3(r4 zı) 2 YQ 22 


Azı. let Tı =Z Uu, in 
munit(z +4 21) 


which by simple equational reasoning is the same as 


Azı. let xı = Uv, in Aza. let po = Uv, x B(P/s) in Vz1z2. pr3(z1) = 22 > 
munit(z1 +4 21) munit(z2 +3 71(p2) * 72(p2)) prg(ri 21) = re z2 


We want to build a coupling such that if we sample (0,0,0,1) or (0,0,0,—1) 
from Uy,, then we sample 0 from B(3/4), and otherwise if we sample (x1, £2, £3, 0) 
from Uy,, we sample (x1, £2,£3) from U3. Formally, we prove this with the 
[Coupling] rule. Given X : U4 — B, by simple computation we show that: 


Pr [aE X| < Pr z2 € Jr € X.pr3(x) =r XT 
„B AEX S a PE ppg lt? E fy |B € Xprå(e) = mlo) # mY 
This concludes the proof. From the previous example, it follows that the 
lazy walk in 3 dimensions is transient, since the random walk in 3 dimensions 
is transient. By simple reasoning, we now conclude that the random walk in 4 
dimensions is also transient. 
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4 Probabilistic Guarded Lambda Calculus 


To ensure that a function on infinite datatypes is well-defined, one must check 
that it is productive. This means that any finite prefix of the output can be 
computed in finite time. For instance, consider the following function on streams: 


letrec bad (x: xs) = x: tail(bad xs) 


This function is not productive since only the first element can be computed. 
We can argue this as follows: Suppose that the tail of a stream is available one 
unit of time after its head, and that x:xs is available at time 0. How much time 
does it take for bad to start outputting its tail? Assume it takes k units of time. 
This means that tail(bad xs) will be available at time k + 1, since xs is only 
available at time 1. But tail(bad xs) is exactly the tail of bad(x:xs), and 
this is a contradiction, since x:xs is available at time 0 and therefore the tail of 
bad(x:xs) should be available at time k. Therefore, the tail of bad will never 
be available. 

The guarded lambda calculus solves the productivity problem by distinguish- 
ing at type level between data that is available now and data that will be avail- 
able in the future, and restricting when fixpoints can be defined. Specifically, 
the guarded lambda calculus extends the usual simply typed lambda calculus 
with two modalities: > (pronounced later) and O (constant). The later modality 
represents data that will be available one step in the future, and is introduced 
and removed by the term formers > and prev respectively. This modality is used 
to guard recursive occurrences, so for the calculus to remain productive, we must 
restrict when it can be eliminated. This is achieved via the constant modality, 
which expresses that all the data is available at all times. In the remainder of 
this section we present a probabilistic extension of this calculus. 


Syntax. Types of the calculus are defined by the grammar 


A,B:=b|N|AxB|A+B|A—>B|Str4|O A|eA| D(C) 


where b ranges over a collection of base types. Str, is the type of guarded streams 
of elements of type A. Formally, the type Str, is isomorphic to A x > Stra. This 
isomorphism gives a way to introduce streams with the function (::) : A —> 
>Str4 — Stra and to eliminate them with the functions hd : Stra — A and 
tl : Stra — >Stra. D(C) is the type of distributions over discrete types C. 
Discrete types are defined by the following grammar, where bo are discrete base 
types, e.g., Z. 


C,D:=bo |N| Cx D|C+D|Stre|ec. 


Note that, in particular, arrow types are not discrete but streams are. This is due 
to the semantics of streams as sets of finite approximations, which we describe 
in the next subsection. Also note that Str, is not discrete since it makes the 
full infinite streams available. 
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We also need to distinguish between arbitrary types A, B and constant types 
S,T, which are defined by the following grammar 


S, T :=bo |N|SxT|S4+T|S-T|OA 


where bc is a collection of constant base types. Note in particular that for any 
type A the type O A is constant. 
The terms of the language t are defined by the following grammar 


t::=x |c |0| St | case t of 0 > t; S œ t | u | munit(t) | let x = t int 
| (t,t) | mat | mat | inj,t | injot | case tof inj, a.t;injgy.t | Av.t | tt | fix z. t 
| t::ts | hdt | tlt | box t | letb x — t in t | letce z — t in t | >Ẹ.t | prev t 


where € is a delayed substitution, a sequence of bindings [x1 + t1, ..., En < tnl]. 
The terms c are constants corresponding to the base types used and munit(t) 
and let x = t in t are the introduction and sequencing construct for probability 
distributions. The meta-variable u stands for base distributions like Uc and B(p). 
Delayed substitutions were introduced in [13] in a dependent type theory to 
be able to work with types dependent on terms of type >A. In the setting of a 
simple type theory, such as the one considered in this paper, delayed substitu- 
tions are equivalent to having the applicative structure [14] ® for the > modality. 
However, delayed substitutions extend uniformly to the level of propositions, and 
thus we choose to use them in this paper in place of the applicative structure. 


Denotational Semantics. The meaning of terms is given by a denotational model 
in the category S of presheaves over w, the first infinite ordinal. This category 
S is also known as the topos of trees [15]. In previous work [1], it was shown 
how to model most of the constructions of the guarded lambda calculus and its 
internal logic, with the notable exception of the probabilistic features. Below we 
give an elementary presentation of the semantics. 

Informally, the idea behind the topos of trees is to represent (infinite) objects 
from their finite approximations, which we observe incrementally as time passes. 
Given an object x, we can consider a sequence {x;} of its finite approximations 
observable at time i. These are trivial for finite objects, such as a natural number, 
since for any number n, n; = n at every i. But for infinite objects such as streams, 
the ith approximation is the prefix of length i+ 1. 

Concretely, the category S consists of: 


— Objects X: families of sets {X;};en together with restriction functions rž : 
Xn+1 `> Xn. We will write simply rn if X is clear from the context. 
— Morphisms X — Y : families of functions a, : Xn — Yn commuting with 


restriction functions in the sense of rY o @n41 = An o rž. 


The full interpretation of types of the calculus can be found in Fig. 8 in the 
appendix. The main points we want to highlight are: 


— Streams over a type A are interpreted as sequences of finite prefixes of elements 
of A with the restriction functions of A: 


[Stra] ê [A]o x {+} CS [Ahh x [Str4Jo 2 [Ale x [Stra] = ++- 
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— Distributions over a discrete object C are defined as a sequence of distributions 
over each [C]j: 
D(r D(r D(r. 
[DON È DUCT) *? DUC)“ Dol)... 
where D([C],) is the set of (probability density) functions u : [C]; — [0,1] 
such that J`, x #% = 1, and D(r;) adds the probability density of all the 


points in [C]i+1 that are sent by r; to the same point in the [C];. In other 
words, D(ri)(4)(a) = Pry.plri(y) = 2] 


An important property of the interpretation is that discrete types are inter- 
preted as objects X such that X; is finite or countably infinite for every 7. This 
allows us to define distributions on these objects without the need for measure 
theory. In particular, the type of guarded streams Str is discrete provided A is, 
which is clear from the interpretation of the type Stra. Conceptually this holds 
because [Stra]; is an approximation of real streams, consisting of only the first 
i + 1 elements. 

An object X of S is constant if all its restriction functions are bijections. 
Constant types are interpreted as constant objects of S and for a constant type 
A the objects [OA] and [A] are isomorphic in S. 


Typing Rules. Terms are typed under a dual context A | I’, where I is a usual 
context that binds variables to a type, and A is a constant context containing 
variables bound to types that are constant. The term letc x — u in t allows us 
to shift variables between constant and non-constant contexts. The typing rules 
can be found in Fig. 2. 

The semantics of such a dual context A | I is given as the product of types 
in A and I, except that we implicitly add [O in front of every type in A. In the 
particular case when both contexts are empty, the semantics of the dual context 
correspond to the terminal object 1, which is the singleton set {x} at each time. 

The interpretation of the well-typed term A | I} t: A is defined by induc- 
tion on the typing derivation, and can be found in Fig.9 in the appendix. 


Applicative Structure of the Later Modality. As in previous work we can define 
the operator ® satisfying the typing rule 
A|PrEt:p(A— B) A|PFu:pA 
A|CFt@u:>eB 


and the equation (>t) ® (>u) =p(t u) as the termt@u+op[f — t,x uj.fa. 
Example: Modelling Markov Chains. As an application of ® and an example 


of how to use guardedness and probabilities together, we now give the precise 
definition of the markov construct that we used to model Markov chains earlier: 


markov : C —> (C > D(C)) — D(Strc) 
markov = fix f. Av.Ah. 
let z = h z in let t = swapõtc (f ® bz ® PR) in munit(a::t) 
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cr: Aer ct: AEA A|I,c:AFt:B 
A|[Fa:A A|[TFa:A A|[FArt:A>B 
A|rtt:A>+B A|Tru:A A|I,f:eAbkt:A Al|-Ft:pA 
A|rFtu:B A|CF fix f.t:A A|IF prevt: A 
A|-Ft:A A|Thu:OB A,x:B|Prt:A 
A |T F box t: OA A|T Hletbz+}uint: A 


A|rFu:B A,z:B|Prt:A B constant 
A|Tletcx+wuint:A 
A |T, x1 : A1, En: ÁAnFt:A A|DR ti: DA; A|rFt:A A discrete 


AJT Fojri + ti,..., En tr] .t: >A AJT F munit(t) : D(A) 
A|rFt:D(A) A|I,c:AFu: D(B) u primitive distribution on type A 
A| let z= tin u: D(B) A|lE wp: D(A) 


Fig. 2. A selection of the typing rules of the guarded lambda calculus. The rules for 
products, sums, and natural numbers are standard. 


The guardedness condition gives f the type >(C — (C — D(C)) —> D(Strc)) 
in the body of the fixpoint. Therefore, it needs to be applied functorially (via 
®) to >z and ph, which gives us a term of type >D(Strc). To complete the 
definition we need to build a term of type D(> Stre) and then sequence it with :: 
to build a term of type D(Strc). To achieve this, we use the primitive operator 
swap¢p : >D(C) — D(>C), which witnesses the isomorphism between >D(C) and 
D(>C). For this isomorphism to exist, it is crucial that distributions be total 
(i.e., we cannot use subdistributions). Indeed, the denotation for >D(C) is the 
sequence {x} — D(C1) — D(C2) — ..., while the denotation for D(>C) is the 
sequence D({*}) — D(C1) — D(C2) — ..., and {*} is isomorphic to D({*}) in 
Set only if D considers only total distributions. 


5 Guarded Higher-Order Logic 


We now introduce Guarded HOL (GHOL), which is a higher-order logic to reason 
about terms of the guarded lambda calculus. The logic is essentially that of [1], 
but presented with the dual context formulation analogous to the dual-context 
typing judgement of the guarded lambda calculus. Compared to standard intu- 
itionistic higher-order logic, the logic GHOL has two additional constructs, corre- 
sponding to additional constructs in the guarded lambda calculus. These are the 
later modality (>) on propositions, with delayed substitutions, which expresses 
that a proposition holds one time unit into the future, and the “always” modality 
, which expresses that a proposition holds at all times. Formulas are defined 
by the grammar: 


b,¥2=Tl|OAY|OVY|W | Va.9 | Jr. |b [x1 — tr... an — tn] -$ | Ob 
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The basic judgement of the logic is A | X | I | Y F @ where X is a logical context 
for A (that is, a list of formulas well-formed in A) and W is another logical 
context for the dual context A | I. The formulas in context X must be constant 
propositions. We say that a proposition ¢ is constant if it is well-typed in context 
A |- and moreover if every occurrence of the later modality in ¢ is under the 
modality. Selected rules are displayed in Fig.3. We highlight [Loeb] induction, 
which is the key to reasoning about fixpoints: to prove that ¢ holds now, one can 
assume that it holds in the future. The interpretation of the formula A | It ¢ 
is a subobject of the interpretation [A | T]. Concretely the interpretation A of 
A|I'' ¢disa family {A;}7°, of sets such that A; C [A | I]j. This family must 
satisfy the property that if x € Aj41 then r;(x) € A; where r; are the restriction 
functions of [A | T]. The interpretation of formulas is defined by induction on 
the typing derivation. In the interpretation of the context A | X | I | W the 
formulas in X are interpreted with the added O modality. Moreover all formulas 
@ in X are typeable in the context A | - F ġ and thus their interpretations are 
subsets of [ODA]. We treat these subsets of [A | I] in the obvious way. 

The cases for the semantics of the judgement A | IF H ¢ can be found in the 
appendix. It can be shown that this logic is sound with respect to its model in 
the topos of trees. 


Theorem 2 (Soundness of the semantics). The semantics of guarded 
higher-order logic is sound: if A | X | T | © F ¢ is derivable then for all 
n €N, [O5] [Wn E [ol 


In addition, Guarded HOL is expressive enough to axiomatize standard prob- 
abilities over discrete sets. This axiomatization can be used to define the ¢ modal- 
ity directly in Guarded HOL (as opposed to our relational proof system, were 
we use it as a primitive). Furthermore, we can derive from this axiomatization 
additional rules to reason about couplings, which can be seen in Fig. 4. These 
rules will be the key to proving the soundness of the probabilistic fragment of 
the relational proof system, and can be shown to be sound themselves. 


Proposition 2 (Soundness of derived rules). The additional rules are 
sound. 


6 Relational Proof System 


We complete the formal description of the system by describing the proof rules 
for the non-probabilistic fragment of the relational proof system (the rules of the 
probabilistic fragment were described in Sect. 3.2). 


6.1 Proof Rules 


The rules for core A-calculus constructs are identical to those of [2]; for conve- 
nience, we present a selection of the main rules in Fig. 7 in the appendix. 
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gew pEX TDHt:ir Ptt':7 t=t 
— SS AX NV 
A|ZS|F|WKe O AlS|rl|we¢e'~% A|ZS|P|Wrt=t co 
A|Z|F| Vt gt/z] A|V |r| wrt=u A| E|T |Z, oH 
| L/P |e oft/z] A|ZIT| guest fd ERO Es ale 
A| E|T |Y F gu/z] A|Z|P\|WF¢ 
A| E |T, zı: Ai,...,2n:An|WF@ AlPRH: PA ... AIP tn :PAn S 


A|X|L0 |W els + ti,..., En < tn]. 


A|X|-|-Felsicti...tnctr].@ Alett :pA ... Alertn:PAn S 
A| Z |T |Y F @[prev ti/x1]... [prev tn/£n] 


E 


A| XV |L| Wee laiety,...,¢n<tn] wv A|DRt:pAr... A| Ltr: dAn 
A|X| 0,21: Ai,...,2n:An|Wwee 


> 
AJS |I |For —t,...,2n — tn] -o app 
AlS- |-Fe A\ZS|P|WEOY AlS p|C|Vb oe 
A|Z|r|wro¢e ' A|ZS|P|WF¢ E 
Fig. 3. Selected Guarded Higher-Order Logic rules 

A | X | p” | WE Ofe, @t1,29¢t2]P A | z | T, xı : C1, x2 * Co | W, pF y MONO? 

AX |LP|WE Ojei tiert] 

A|X|lr|Wt dt t; 
[Z |T |Y F ġltı/zı][t2/z2] UNIT? 
A | yy | I | Wr [x1 <munit(t1) 22 <—munit(t2)|P 
A | x | Ẹ | wr Ola ty ,22-ta]P 
A | a | Iti ; C1, x2 : C2 | W, o E Oly th yee th |Y MLET2 
A | X | rT | Wr lyr let xı=tı in th yotlet r2=t2 in th] 

A | & | r | Wr feet]? A | >i | I; Tı: Cı | Y, o z Oly th yth] V MLET-L 


A|X | P| Wr Olyy let x1=tı in ti yo th] 


Fig. 4. Derived rules for probabilistic constructs 


We briefly comment on the two-sided rules for the new constructs (Fig. 5). 
The notation 92 abbreviates a context A | X | I | W. The rule [Next] relates two 
terms that have a > term constructor at the top level. We require that both have 
one term in the delayed substitutions and that they are related pairwise. Then 
this relation is used to prove another relation between the main terms. This rule 
can be generalized to terms with more than one term in the delayed substitution. 
The rule [Prev] proves a relation between terms from the same delayed relation 
by applying prev to both terms. The rule [Box] proves a relation between two 
boxed terms if the same relation can be proven in a constant context. Dually, 
[LetBox] uses a relation between two boxed terms to prove a relation between 
their unboxings. [LetConst] is similar to [LetBox], but it requires instead a relation 
between two constant terms, rather than explicitly D-ed terms. The rule [Fix] 
relates two fixpoints following the [Loeb] rule from Guarded HOL. Notice that in 
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the premise, the fixpoints need to appear in the delayed substitution so that the 
inductive hypothesis is well-formed. The rule [Cons] proves relations on streams 
from relations between their heads and tails, while [Head] and [Tail] behave as 
converses of [Cons]. 

Figure6 contains the one-sided versions of the rules. We only present the 
left-sided versions as the right-sided versions are completely symmetric. The 
rule [Next-L] relates at ¢ a term that has a > with a term that does not have ab. 
First, a unary property ¢’ is proven on the term u in the delayed substitution, 
and it is then used as a premise to prove ¢ on the terms with delays removed. 
Rules for proving unary judgements can be found in the appendix. Similarly, 
[LetBox-L] proves a unary property on the term that gets unboxed and then 
uses it as a precondition. The rule [Fix-L] builds a fixpoint just on the left, and 
relates it with an arbitrary term tz at a property ¢. Since @ may contain the 
variable rə which is not in the context, it has to be replaced when adding >¢ġ to 
the logical context in the premise of the rule. The remaining rules are similar to 
their two-sided counterparts. 


6.2 Metatheory 


We review some of the most interesting metatheoretical properties of our rela- 
tional proof system, highlighting the equivalence with Guarded HOL. 


Theorem 3 (Equivalence with Guarded HOL). For all conterts A, I; 
types 01,02; terms tı, to; sets of assertions X, W; and assertions ġ: 


A|X | |WEt,:0, ~te:02|¢ —> A| XY || We gltı/rıllt2/r2] 


The forward implication follows by induction on the given derivation. The reverse 
implication is immediate from the rule which allows to fall back on Guarded 
HOL in relational proofs. (Rule [SUB] in the appendix). The full proof is in the 
appendix. The consequence of this theorem is that the syntax-directed, relational 
proof system we have built on top of Guarded HOL does not lose expressiveness. 

The intended semantics of a judgement A| 1 | [|W ti: Ay ~ tg: A |o 
is that, for every valuation 6 = A, y — T, if [2] (4) and [Y] (ô, y), then 


[el (6, yea — [4 (6,7), r2 — [te] 6, )) 


Since Guarded HOL is sound with respect to its semantics in the topos of trees, 
and our relational proof system is equivalent to Guarded HOL, we obtain that 
our relational proof system is also sound in the topos of trees. 


Corollary 2 (Soundness and consistency). [fA| © |T |W ti :o2~ t2: 
oz | $, then for every valuation ô = A, y = T: 


[AF OSAAJA ITEA], > 
[A | Pri: 01,r1 : o2 F E](8, yiri — [4 | T F ti] (8, ye — [4 | T E te] (8, 7)]) 


In particular, there is no proof of A| Ø| T | Qh ti : o1 ~ tg: o| L. 
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A| X |T, x1 : Ai, #2: Ao | Y, ¢'[a1/rij[z2/r2] F tı : Ar ~ te : Ao | o 
QE Ui: DA, ~N U2: pAg | ofri, r2 {< rı, r2].ġ' 


Next 
Q F ofz 4} u1].ti : DÁ ~ p[t2<-ua].te : PA2 | fz14 U1, £2 — U2, r1 r1, r2 < r2].¢ 
cl bAt ae Se ce 
rev 
Q F prev tı : Ai ~ prev tz: A2 | Q 
AJE- | -Ft : A ~t: A| 
Box 


Q F box tı : OA1 ~ box tz : OA2 | Og[letb zı + rı in zı/rı][letb x2 + r2 in x2/rə2] 


Q F u : OBı ~ uz: OB | Og[letb zı + rı in zı/rı][letb x2 + r2 in x2/rə] 
A, zı ; Bı, £2 > B2 | X, ¢|r1/r1] z2/r2] | r | Wirt, $ Aj ~ te H Ag | g 


F - 7 LetBox 
Q F letb xı + uy in tı : Ai ~ letb z2 + u2 in t2 : A2 | ¢ 
Bı, Bz, ġ constant FV(¢)N FV(L) =90 N Fu: Bı ~uz: B.|¢ 
A, zı : Bı, £2 © Bə | X, [zı /rı][£2/r2] | T | Wet, $ Ai ~ te : Ag | o 
LetConst 


NQ F lete zı + u in tı : Ai ~ lete z2 + u2 in te : Aa | ' 


AJIT, fı : DAL, f2 : DA2 | Y, ofri, r2 & fi, fo] F ti: Aı ~ te: A2 | Q Fix 
Q F fix fi. tı : Ai ~ fix fo. t2 : A2 |Q 


NF xı: A1 ~ a2: A2 | dn NQ F zsı : PStra, ~ vs2:PStra, | Qt 
Q E Wai, T2, 51, 52-Ph|x£1/r1][£2/xr2] = pe[s1/r1][s2/r2] = [x1 :: 81/ri] [v2 +: 82/r2] Eon 
N F z1::51 : Stra, ~ z2: 82 : Stra, | Q 


NQF ti : Stra, ~ tı : Stra, | d[hd rı/rı][hd r2/r2] 


Head 
NQF hd tı : Aı ~ hd tz: A2 | Q 


Ab tr: Stra, ~ te: Stray | [E ri/rijlél r2/r2] i 
a 
QE tl tı :PStra, ~ tl te: > Stra, | Q ' 


Fig. 5. Two-sided rules for Guarded RHOL 


6.3 Shift Couplings Revisited 


We give further details on how to prove the example with shift couplings 
from Sect. 3.3. (Additional examples of relational reasoning on non-probabilistic 
streams can be found in the appendix) Recall the step functions: 


step = Az.let z = Us_1,13 in munit(z + x) 
Istep2 = Aw.let z = Ut—1,1} in let b =Uzo,1; in munit(x + 2 * z * b) 


We axiomatize the predicate Allg, which relates the element at position 2i in 
one stream to the element at position i in another stream, as follows. 


Vn 2051 0S82y1.0[21/21][22/r2] > 
> [ys1 — #51] .> [zs] — ys1, YS2 — £89]. Allo 1 (251, YS2, 21.22.6) > 
All 1 (£1::Y1::£81, ©2::%S2, 21.2.0) 


In fact, we can assume that, in general, we have a family of Allm; mọ predi- 
cates relating two streams at positions mı : i and mg -i for every i. 
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A| E |T, æ: Bi |, ¢'[ai/r] F ti: Ai ~ ta: Ao | ob 
Qu : >B: | ofr + r].g' 


Next-L 
Q F o [ay + ui]. : >A ~ to: Ao | > [ai t~ ui Pri rı]. 


A|S'|-|-F ti: >A ~ tə: Ag | olrı + rı]. 


Prev-L 
A|X | 11; I | Y1; P F prev tı : Aı ~ t2: A| Q 


A|| |P2F ti : Ai ~ te: Aol od 
FV(ti) Z FV(In) FV (2) C FV (D) 
A|X| D; | Y1; We F box tı : A, ~ ta: A| glletb zı + rı in z1/rı] 


Box-L 


Q Fu: Bı | g[letb By eri in xi /r] 
A,r: By | X, ¢[ai/r] |T| EF ti: A ~t: A| o 
Q H letb zı + u1 in tı : A ~ t2: A2 | Q 


LetBox-L 


Bı, ¢ constant FV(¢)N FV(l)=0 Nu: Bil ¢ 
A, x1 : Bi | X, ¢[xi/r] | 0) ek th : Ai ~ te: A| Q 


LetConst-L 
Q F letce zı + u1 in tı : Aı ~ t2: A| Q 


A | D | Ty fi : >A | Y, ofri t= fil- (@[t2/r2]) F ty : Ay ~ te: Ag | Q 


Fix-L 
A|X' || We fix fi. ti: A ~ te: Aol @ 


QE a1: Ai ~ t2: A2 | on Qt xs, :PStra, ~ to: Ao | Qt 
Qt Vai, £2, 081.bn[£1/81][v2/r2] > ptlxesı/rı]|xr2/r2] > dlx ::xs1ı/rı][x2/r2] 


Cons-L 
QE T1: T51: Stra, ~ te si Ao | 0) 


Q H tı H Stra, ~ ty h Ag | olhd rı/rı] 


Head-L 
NF hd tı : A, ~ te: Ao|o 


QH tı P Stra, ~ te H Ag | oftl rı/rı] 


Tail-L 
N F tl tı : > Stra; ~ t2: A2 |Q 


Fig. 6. One-sided rules for Guarded RHOL 


We can now express the existence of a shift coupling by the statement: 
pı = p2 F markov(pi, step) ~ markov(p2, lstep2) | opac] All2 1 (y1, Y2, 21-22-21 = 22) 
2T2 


For the proof, we need to introduce an asynchronous rule for Markov chains: 


QEt1: Ci ~ te: Co|¢@ 
QE (Azı.let x4 = hi z1 in hy £1): Cı > D(C1) ~ h2 : C2 — D(C2) | 
Vxıx2.ġ[x1/z1][£2/2z2] = Olz r1 z1,z2—r2 z2] 
Q H markov(tı, hı) : D(Stro, ) ~ markov(t2, h2) : D(Strcs) | 
Oly, r1 yore] All2,1 (y1, Y2, 21.22.0) 


Markov-2-1 


This asynchronous rule for Markov chains shares the motivations of the rule for 
loops proposed in [6]. Note that one can define a rule [Markov-m-n] for arbitrary 
m and n to prove a judgement of the form Allm,n on two Markov chains. 
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We show the proof of the shift coupling. By equational reasoning, we get: 


Ax,.let x = hy Ly in hy ti = Ax,.let Zi = U-11} in hı (z1 + zı) 
= dx, let z1 = Ug_114 in let z1 = Ug_1,1} in munit(z, + z1 + z1) 


and the only interesting premise of [Markov-2-1] is: 
Axı. let z1 = Ut-1,1} in At. let z2 = U-11} in 


let zi = Ug-—1,1} in ~ let bg = Us1,0} in 
munit(z{ + 21 + x1) munit(a2 + 2 * bə * z2) 


Vzr1£2.£1 = T2 > 
© 
rı Ti = r2 T2 


Couplings between zı and zg and between z| and bə can be found by simple 
computations. This completes the proof. 


7 Related Work 


Our probabilistic guarded A-calculus and the associated logic Guarded HOL 
build on top of the guarded A-calculus and its internal logic [1]. The guarded 
A-calculus has been extended to guarded dependent type theory [13], which can 
be understood as a theory of guarded refinement types and as a foundation for 
proof assistants based on guarded type theory. These systems do not reason 
about probabilities, and do not support syntax-directed (relational) reasoning, 
both of which we support. 

Relational models for higher-order programming languages are often defined 
using logical relations. [16] showed how to use second-order logic to define and 
reason about logical relations for the second-order lambda calculus. Recent work 
has extended this approach to logical relations for higher-order programming 
languages with computational effects such as nontermination, general references, 
and concurrency [17-20]. The logics used in loc. cit. are related to our work in 
two ways: (1) the logics in loc. cit. make use of the later modality for reasoning 
about recursion, and (2) the models of the logics in loc. cit. can in fact be defined 
using guarded type theory. Our work is more closely related to Relational Higher 
Order Logic [2], which applies the idea of logic-enriched type theories [21,22] 
to a relational setting. There exist alternative approaches for reasoning about 
relational properties of higher-order programs; for instance, [23] have recently 
proposed to use monadic reification for reducing relational verification of F* to 
proof obligations in higher-order logic. 

A series of work develops reasoning methods for probabilistic higher-order 
programs for different variations of the lambda calculus. One line of work has 
focused on operationally-based techniques for reasoning about contextual equiv- 
alence of programs. The methods are based on probabilistic bisimulations [24, 25] 
or on logical relations [26]. Most of these approaches have been developed for 
languages with discrete distributions, but recently there has also been work 
on languages with continuous distributions [27,28]. Another line of work has 
focused on denotational models, starting with the seminal work in [29]. Recent 
work includes support for relational reasoning about equivalence of programs 
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with continuous distributions for a total programming language [30]. Our app- 
roach is most closely related to prior work based on relational refinement types 
for higher-order probabilistic programs. These were initially considered by [31] 
for a stateful fragment of F*, and later by [32,33] for a pure language. Both 
systems are specialized to building probabilistic couplings; however, the latter 
support approximate probabilistic couplings, which yield a natural interpreta- 
tion of differential privacy [34], both in its vanilla and approximate forms (i.e. €- 
and (e€,6)-privacy). Technically, approximate couplings are modelled as a graded 
monad, where the index of the monad tracks the privacy budget (e or (€,0)). 
Both systems are strictly syntax-directed, and cannot reason about computa- 
tions that have different types or syntactic structures, while our system can. 


8 Conclusion 


We have developed a probabilistic extension of the (simply typed) guarded à- 
calculus, and proposed a syntax-directed proof system for relational verification. 
Moreover, we have verified a series of examples that are beyond the reach of prior 
work. Finally, we have proved the soundness of the proof system with respect to 
the topos of trees. 

There are several natural directions for future work. One first direction is 
to enhance the expressiveness of the underlying simply typed language. For 
instance, it would be interesting to introduce clock variables and some type 
dependency as in [13], and extend the proof system accordingly. This would 
allow us, for example, to type the function taking the n-th element of a guarded 
stream, which cannot be done in the current system. Another exciting direction 
is to consider approximate couplings, as in [32,33], and to develop differential 
privacy for infinite streams—preliminary work in this direction, such as [35], 
considers very large lists, but not arbitrary streams. A final direction would be 
to extend our approach to continuous distributions to support other application 
domains. 
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Abstract. We define the exceptional translation, a syntactic translation 
of the Calculus of Inductive Constructions (CIC) into itself, that covers 
full dependent elimination. The new resulting type theory features call- 
by-name exceptions with decidable type-checking and canonicity, but 
at the price of inconsistency. Then, noticing parametricity amounts to 
Kreisel’s realizability in this setting, we provide an additional layer on top 
of the exceptional translation in order to tame exceptions and ensure that 
all exceptions used locally are caught, leading to the parametric excep- 
tional translation which fully preserves consistency. This way, we can 
consistently extend the logical expressivity of CIC with independence of 
premises, Markov’s rule, and the negation of function extensionality while 
retaining 7-expansion. As a byproduct, we also show that Markov’s prin- 
ciple is not provable in CIC. Both translations have been implemented 
in a Coq plugin, which we use to formalize the examples. 


1 Introduction 


Monadic translations constitute a canonical way to add effects to pure func- 
tional languages [1]. Until recently, this technique was not available for type 
theories such as CIC because of complex interactions with dependency. In a 
recent paper [2], we have presented a generic way to extend the monadic trans- 
lation to dependent types, using the weaning translation, as soon as the monad 
under consideration satisfies a crucial property: being self-algebraic. Indeed, in 
the same way that the universe of types O; is itself a type (of a higher universe) 
in type theory, the type of algebras of a monad T 


+A:0;.T AOA 


needs to be itself an algebra of the monad to allow a correct translation of the 
universe. However, in general, the weaning translation does not interpret all of 
CIC because dependent elimination needs to be restricted to linear predicates, 
that is, those that are intuitively call-by-value [3]. In this paper, we study the 
particular case of the error monad, and show that its weaning translation can 
be simplified and tweaked so that full dependent elimination is valid. 


© The Author(s) 2018 
A. Ahmed (Ed.): ESOP 2018, LNCS 10801, pp. 245-271, 2018. 
https: //doi.org/10.1007/978-3-319-89884-1_9 
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This exceptional translation gives rise to a novel extension of CIC with new 
computational behaviours, namely call-by-name exceptions.' That is, the type 
theory induced by the exceptional translation features new operations to raise 
and catch exceptions. This new logical expressivity comes at a cost, as the result- 
ing theory is not consistent anymore, although still being computationally rel- 
evant. This means that it is possible to prove a contradiction, but, thanks to a 
weak form of canonicity, only because of an unhandled exception. Furthermore, 
the translation allows us to reason directly in CIC on terms of the exceptional 
theory, letting us prove, e.g., that assuming some properties on its input, an 
exceptional function actually never raises an exception. We thus have a sound 
logical framework to prove safety properties about impure dependently-typed 
programs. 

We then push this technique further by noticing that parametricity provides 
a systematic way to describe that a term is not allowed to produce uncaught 
exceptions, bridging the gap between Kreisel’s modified realizability [4] and para- 
metricity inside type theory [5]. This parametric exceptional translation ensures 
that no exception reaches toplevel, thus ensuring consistency of the resulting 
theory. Pure terms are automatically handled, while it is necessary to show 
parametricity manually for terms internally using exceptions. We exploit this 
computational extension of CIC to show various logical results over CIC. 


Contributions 


— We describe the exceptional translation, the first monadic translation for the 
error monad for CIC, including strong elimination of inductive types, result- 
ing in a sound logical framework to reason about impure dependently-typed 
programs. 

— We use parametricity to extend the exceptional translation, getting a consis- 
tent variant dubbed the parametric exceptional translation. 

— We show that Markov’s rule is admissible in CIC. 

— We show that definitional n-expansion together with the negation of function 
extensionality is admissible in CIC. 

— We show that there exists a syntactical model of CIC that validates the inde- 
pendence of premises (which is known to be generally not valid in intuitionistic 
logic [6]) and use it to recover the recent result of Coquand and Mannaa [7], 
i.e., that Markov’s principle is not provable in CIC. 

— We provide a COQ plugin? that implements both translations and with which 
we have formalized all the examples. 


Plan of the Paper. In Sect.2, we describe the exceptional translation and the 
resulting new computational principles arising from it. In Sect.3, we present 
the parametric variant of the exceptional translation. Section 4 is devoted to the 


1 The fact that the resulting exception are call-by-name is explained in detailed in [2] 
using a call-by-push-value decomposition. Intuitively, it comes from the fact that 
CIC is naturally call-by-name. 

? The plugin is available at https: //github.com/CoqHott/exceptional-tt. 
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A, B,M,N :=0:;|x| MN |z: A.M |H: AB 


TAs=-|T,a:A 
=T i<j TFM:B TFA:O; 
Iie. T,z: AFM:B 
TAA: O; Dr: AFB:O; TEM:B TEA:O; A=B 
Tb Ie: A.B: Omaxi,3) TEKFM:A 
T,cz:AFM:B TEKiez:A.B:0; TEM :Ilr:A.B TEN:A 
ThAv: A.M :Ilx:A.B TEM N: B{a := N} 
rF A: OŒ TRFA:O; 
Fs FTD: A T,x:AFgz:A 
(àx: A.M) N = M{z:= N} (congruence rules ommitted) 


Fig. 1. Typing rules of CC% 


various logical results resulting from the parametric exceptional translations. In 
Sect. 5, we discuss possible extensions of the translation with negative records 
and an impredicative universe. Section 6 describes the COQ plugin and illustrates 
its use on a concrete example. We discuss related work in Sect. 7 and conclude 
in Sect. 8. 


2 The Exceptional Translation 


We define in this section the exceptional translation as a syntactic translation 
between type theories. We call the target theory 7, upon which we will make 
various assumptions depending on the objects we want to translate. 


2.1 Adding Exceptions to CC,, 


In this section, we describe the exceptional translation over a purely negative 
theory, i.e., featuring only universes and dependent functions, called CC,,, which 
is presented in Fig. 1. This theory is a predicative version of the Calculus of Con- 
structions [8], with an infinite hierarchy of universes O; instead of one impred- 
icative sort. We assume from now on that T contains at least CC, itself. 

The exceptional translation is a simplification of the weaning translation [2] 
applied to the error monad. Owing to the fact that it is specifically tailored for 
eo pron this allows to give a more compact presentation of it. 

Let E o be a fixed type of exceptions in 7. The weaning translation for 
the error amnad amounts to interpret types as algebras, i.e., as inhabitants of 
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the dependent sum XA : O;. (A + E) — A. In this paper, we take advantage of 
the fact that the algebra morphism restricted to A is always the identity. Thus 
every type just comes with a way to interpret failure on this type, i.e. types 
are intuitively interpreted as a pair of an A: O; with a default (raise) function 
Ag : E — A. In practice, it is slightly more complicated as the universe of types 
itself is a type, so its interpretation must comes with a default function. We 
overcome this issue by assuming a term type,, representing types that can raise 
exceptions. This type comes with two constructors: TypeVal, which allows to 
construct a type, from a type and a default function on this type ; and another 
constructor TypeErr, that represents the default function at the level of type,. 
Furthermore, type, is equipped with an eliminator type_ elim, and thus can be 
thought of as an inductive definition. For simplicity, we axiomatize it instead of 
requiring inductive types in the target of the translation. 


Definition 1. We assume that T features the data below, where i, j indices stand 

for universe polymorphism. 

= Q; : E — i 

- wi : Te: E. Q; e 

— type; : Dj, where i < j 

- TypeVal, : IIA : O;. (E — A) —> type; 

— TypeErr, : E — type; 

~ type_elim;,: HP: type; > Uj. 
(II(A : O;) (Ag: E — A). P (TypeVal; A Ag)) > 

(Ile : E. P (TypeErr, e)) > IIT : type;. P T 


subject to the following definitional equations: 


type_elim;; P py pg (TypeVal; A Ag) =p, A Ag 
type_elim; ; P py pø (TypeErr; e) = pø e 


The Q term describes what it means for a type to fail, i.e. it ascribes a 
meaning to sequents of the form T F M : fail e. In practice, it is irrelevant and 
can be chosen to be degenerate, e.g. Q := A_: E. unit. 

In what follows, we often leave the universe indices implicit although they 
can be retrieved at the cost of more explicit annotations. 

Before defining the exceptional translation we need to derive a term E1° that 
recovers the underlying type from an inhabitant of type and Err that lifts the 
default function to this underlying type. 


Definition 2. From the data of Definition 1, we derive the following terms. 


El; : type; > U; 


:= \A: type;.type_ elim (AT : type;. O;) 
(A(Ao : O;) (Ag : E —> Ap). Ao) Q A 
Err;: ILA: type;,.E— El; A 
:= \(A: type;) (e : E).type_ elim El; 
(A(Ap : O;) (Ag : E > Ao). Ag e) w A 


3 The notation El refers to universes à la Tarski in Martin-Lof type theory. 
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al := TypeVal type, TypeErr, 
At: A. M] := Az : [A]. [M] 
M N] := [M] [N] 
Ix : A. B] := TypeVal (Iz : [A]. [B]) (A(e: E) (x : [A]). [B], e) 
Al, := Err [A] 
Al := El [A] 
J is 
T,¢:A] := [C], <: [A] 


Fig. 2. Exceptional translation 


The exceptional translation is defined in Fig. 2. As usual for syntactic trans- 
lations [9], the term translation is given by [-] and the type translation, written 
[:], is derived from it using the function El. There is an additional macro [-],,, 
defined using Err;, which corresponds to the way to inhabit a given type from 
an exception. 

Note that we will often slightly abuse the translation and use the [|] and [-] 
notation as macros acting on the target theory. This is merely for readability 
purposes, and the corresponding uses are easily expanded to the actual term. 

The following lemma makes explicit how [-] and [-],, behave on universes and 
on the dependent function space. 


Lemma 3 (Unfoldings). The following definitional equations hold: 


= [Uz : A. B] = ! Iz : [A]. [B] 
[O;]o e =TypeErr, e 
- [Hrv : A.B), e = àz : [A]. [B], e 


Proof. By unfolding and straightforward reductions. 


The soundness of the translation follows from the following properties, which 
are fundamental but straightforward to prove. 


Theorem 4 (Soundness). The following properties hold. 


- [M{a := N}] = [M]{x := [N]} (substitution lemma). 

- If M =N then [M] = [N] (conversion lemma). 

- IFTE M:A then [T] F [M]: [A] (typing soundness). 

- TH A:O then [T] F [A], : E — [A] (exception soundness). 


Proof. The first property is by routine induction on M, the second is direct 
by induction on the conversion derivation. The third is by induction on the 
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typing derivation, the most important rule being O; jp which holds because 
[Ui] = TypeVal type; TypeErr, has type type, which j is coget to [O,] by 
Lemma 3. The last property is a direct application of typing soundness and 
unfolding of Lemma 3 for universes. 


We call Jg the theory arising from this interpretation, which is formally 
defined in a way similar to standard categorical constructions over dependent 
type theory. Terms and contexts of Tg are simply terms and contexts of T. A 
context I is valid is Zg whenever its translation [I] is valid in 7. Two terms 
M and N are convertible in Tg whenever their translations [M] and [N] are 
convertible in 7. Finally, T Fy, M : A whenever [I] Hr [M] : [A]. 

That is, it is possible to extend Tg with a new constant c of a given type A 
by providing an inhabitant cg of the translated type [A]. Then the translation is 
extended with [c] := cg. The potential computational rules satisfied by this new 
constant are directly given by the computational rules satisfied by its translation. 
In some sense, the new constant c is just syntactic sugar for cg. Using Tg, 
Theorem 4 can be rephrased in the following way. 


Theorem 5. [fT interprets CC, then so does Tg, that is, the exceptional trans- 
lation is a syntactic model of CCa. 


2.2 Exceptional Inductive Types 


The fact that the only effect we consider is raising exceptions does not really 
affect the negative fragment when compared to our previous work [2], but 
it sure shines when it comes to interpreting inductive datatypes. Indeed, as 
explained in the introduction, the weaning translation only interprets a subset 
of CIC, restricting dependent elimination to linear predicates. Furthermore, it 
also requires a few syntactic properties of the underlying monad ensuring that 
positivity criteria are preserved through the translation, which can be sometimes 
hard to obtain. 

The exceptional translation diverges from the weaning translation precisely 
on inductives types. It allows a more compact translation of the latter, while at 
the same time providing a complete interpretation of CIC, that is, including full 
dependent elimination. 

From now on, we assume that the target theory is a predicative restriction 
of CIC, i.e. that we can construct in it new inductive datatypes as we do in 
e.g. Cog [10], but without considering an impredicative universe. That is, all 
the inductive types we consider in this section live in O. As a matter of fact, 
we slightly abuse the usual nomenclature and simply call CIC this predicative 
fragment in the remainder of the paper. We refrain from describing the generic 
typing rules that extend CC,, into CIC, as they are fairly standard and would 
take up too much space. See for instance Werner’s thesis for a comprehensive 
presentation [11]. 
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[Z] == Mor: [Pal) ~ (pn: [Pal) (ix e (ém : Ul). 
TypeVal (T° pi ... Pn ti... tm) (Tø pi.) Pn ti... im) 


Fig. 3. Inductive type translation 


Type and Constructor Translation. As explained before, the intuitive inter- 
pretation of a type through the exceptional translation is a pair of a type and a 
default function from exceptions into that type. In particular, when translating 
some inductive type Z, we must come up with a type [Z] together with a default 
function E — [Z]. As soon as E is inhabited, that means that we need [Z] to be 
inhabited, preferably in a canonical way. The solution is simple: just as for types 
where we freely added the exceptional case by means of the TypeErr constructor, 
we freely add exceptions to every inductive type. 

In practice, there is an elegant and simple way to do this. It just consists 
in translating constructors pointwise, while adding a new dedicated constructor 
standing for the exceptional case. We now turn to the formal construction. 


Definition 6. Let T be an inductive datatype with 
— parameters pı : Pi, ..., Pn : Pn; 
— indices i1 : 11,...,4m: Im; 
— constructors 
Bi > II(ai1 3 Ai) panes (ai 3 Aji). T Pi -++ Dn Via sa Vim 


Ck : I(ak,ı : Axi) len (akl, 5 Axt,).L Pi «++ Pn Vel ahd Vim 
We define the exceptional translation of T and its constructors in Fig. 3, 


where T° is the inductive type defined by 


— parameters pı : [Pi],---, Pn: [Ph]; 
— indices i1 : [h], ... im : Un]; 
— constructors 


cS :TW(ay41 : [A1,1]) soe (ain : lin ]).Z° Pi -++ Pn [Vi] toe [Vim] 


ea Tor (Aes luis (anne ane e pa Malos Al 
To : T(t1: [L])-.-- (im: Um]).E > T° pı ... Pn ii ... im 


where in the recursive calls in the various A, we locally set 


Example 7. We give a few representative examples of the inductive translation in 
Fig. 4 in a Coq-like syntax. They were chosen because they are simple instances 
of inductive types featuring parameters, indices and recursion in an orthogonal 
way. For convenience, we write © A (Ax : A. B) as Sa: A. B. 
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Ind bool :O0:= Ind bool® : O := 
true : bool true? : bool® 
false : bool false’ : bool’ 
boolg : E — bool?® 
Ind list (A : 0) : 0O := Ind list® (A: [O]) : O := 
nil:list A nil* :list®* A 
cons: A— list A— list A cons® : [A] — list? A —> list? A 
lists : E — list? A 


Ind X (A: O)(B:A—> ¡0s Ind X° (A: [O] (B : [A] > 0) : 0 := 
ex: I(x: A) (y:Bx£). XZ AB ex° : II(x : [A]) (y : [B a]).u° A B 
Xg:E— °° AB 
Ind eq (A: O) (£x: A): A> 0:= Ind eq® (A: [O]) (z: [A]) : [A] > O := 
refl:eq Axr refl®: eq’ Axr 
eq, : lly: [A].E— eq’ Ary 


Fig. 4. Examples of translations of inductive types 


Remark 8. The fact the we locally override the translation for recursive calls 
on the [-] translation of the type being defined means that we cannot handle 
cases where the translation of the type of a constructor actually contains an 
instance of [Z]. Because of the syntactic positivity criterion, the only possibility 
for such a situation to occur in CIC is in the so-called nested inductive definitions. 
However, nested inductive types are essentially a programming convenience, as 
most nested types can be rewritten in an isomorphic way that is not nested. 


Lemma 9. IfT is given as in Definition 6, we have for any terms M, N 
[Z My... Mn Ni... Nm] = T° [Mi]... [Mn] [Mi]... [Nm]. 


This justifies a posteriori the simplified local definition we used in the recur- 
sive calls of the translation of the constructors. 


Theorem 10. For any inductive type T not using nested inductive types, the 
translation from Definition 6 is well-typed and satisfies the positivity criterion. 


Proof. Preservation of typing is a consequence of Theorem 4. The restriction on 
nested types, which is slightly stronger than the usual positivity criterion of CIC, 
is due to the fact that To is not available in the recursive calls and thus cannot 
be used to build a term of type type via the TypeVal constructor. 

Preservation of the positivity criterion is straightforward, as the shape of 
every constructor c, is preserved, and furthermore by Lemma 3 the structure of 
every argument type is preserved by |-] as well. The only additional constructor 
Tg does not mention the recursive type and is thus automatically positive. 


Corollary 11. Type soundness holds for the translation of inductive types and 
their constructors. 
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Pattern-Matching Translation. We now turn to the translation of the elim- 
ination of inductive terms, that is, pattern matching. Once again, its definition 
originates from the fact that we are working with call-by-name exceptions. It is 
well-known that in call-by-name, pattern matching implements a delimited form 
of call-by-value, by forcing its scrutinee before proceeding, at least up to the head 
constructor. Therefore, as soon as the matched term (re-)raises an exception, the 
whole pattern-matching reraises the same exception. A little care has to be taken 
in order to accomodate for the fact that the return type of the pattern-matching 
depends on the scrutinee, in particular when it is the default constructor of the 
inductive type. 

In what follows, we use the 7] ... in notation for clarity, but compact it to i 
for space reasons, when appropriate. 


Definition 12. Assume an inductive T as given in Definition 6. Let Q be the 
well-typed pattern-matching defined as 


match M return X\(i1: th)... (im: Im) (a: ZT X11... Xn ti... im). R with 


| ca aii... aay > Ni 
| Ck ak1 --- akip => Nk 
end 
where 
TEX:P TRKY:({p:=X} TEM: X... Xn Yı... Ym 
l,i: {P:= X}, x:T X 


T, p : Åk H Np: R{i: V{P: X}: Ch X üp} 


then we pose |Q] to be the following pattern-matching. 
match [M] return A(i1 : [i]).-- (im : Hm]) (£ : T° [X1]... [Xn] it... im). [R] with 


| c? Qi,- Qijly => [Ni] 

| Ch Akl +++ Akl, => [Nx] 

| Io ta... ime > [R] {a= To X11... Xn ts... im e} e 
end 


Lemma 13. With notations and typing assumptions from Definition 12, we 
have 


IL] + [Q] : [RH@:= [Y], x := [M]}. 
Proof. Mostly a consequence of Theorem 4 applied to all of the premises of the 
pattern-matching rule. The only thing we have to check specifically is that the 
branch for the default constructor Zg is well-typed as 


I], i: {P:= X} e: EF [R]o{2 := Tø Xie} e: [R]{z:= Tø X ïe} 


which is also due to Theorem 4 applied to R. 
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Lemma 14. The translation preserves t-rules. 
Proof. Immediate, as the translation preserves the structure of the patterns. 


The translation is also applicable to fixpoints, but for the sake of readability 
we do not want to fully spell it out, although it is simply defined by congruence 
(commutation with the syntax). As such, it trivially preserves typing and reduc- 
tion rules. Note that the Coq plugin presented in Sect.6 features a complete 
translation of inductive types, pattern-matching and fixpoints. So the interested 
reader may experiment with the plugin to see how fixpoints are translated. 

Therefore, by summarizing all of the previous properties, we have the follow- 
ing result. 


Theorem 15. If T interprets CIC, then so does Tg, and thus the exceptional 
translation is a syntactic model of CIC. 


2.3 Flirting with Inconsistency 


It is now time to point at the elephant in the room. The exceptional translation 
has a lot of nice properties, but it has one grave defect. 


Theorem 16. /f E is inhabited, then Tg is logically inconsistent. 


Proof. The empty type is translated as 


Ind empty® : 0 := empty, : E — empty°® 


which is inhabited as soon as E is. 


Note that when E is empty, the situation is hardly better, as the translation 
is essentially the identity. However, when 7 satisfies canonicity, the situation is 
not totally desperate as Tz enjoys the following weaker canonicity lemma. 


Lemma 17 (Exceptional Canonicity). Let T be an inductive type with con- 


structors Ci, ..., Cn and assume that T satisfies canonicity. The translation 
of any closed term +z, M : T evaluates either to a constructor of the form 
c? Ny... Na, or to the default constructor Tg e for some e: E. 


Proof. Direct application of Theorem 4 and canonicity of T. 


A direct consequence of Lemma 17 is that any proof of the empty type is 
an exception. As we will see in Sect.4.1, for some types it is also possible to 
dynamically check whether a term of this type is a correct proof, in the sense 
that it does not raise an uncaught exception. This means that while Jg is logically 
unsound, it is computationally relevant and can still be used as a dependently- 
typed programming language with exceptions, a shift into a realm where we would 
have called the weaker canonicity Lemma 17 a progress lemma. 

This is not the end of the story, though. Recall that 7g only exists through 
its embedding [-] into 7. In particular, if 7 is consistent, this means that one 
can reason about terms of Jg directly in 7. For instance, it is possible to prove 
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in 7 that assuming some properties about its input, a function in 7g never raises 
an exception. Hence not only do we have an effectul programming language, but 
we also have a sound logical framework allowing to transparently prove safety 
properties about impure programs. 

It is actually even better than that. We will show in Sect.3 that safety prop- 
erties can be derived automatically for pure programs, allowing to recover a 
consistent type theory as long as T is consistent itself. 


2.4 Living in an Exceptional World 


We describe here what Jg feels like in direct style. The exceptional theory feature 
a new type E which reifies the underlying type E of exceptions in Tg. It uses the 
fact that for E, the default function (here of type E — E) can simply be defined 
as the identity function. Its translation is given by 


[E] : [O] := TypeVal E (Ae : E. e). 


Then, it is possible to define in %g a function raise : HA : O.E — A that 
raises the provided exception at any type as 


[raise] := A(A : type) (e : E). Err A e. 


As we have already mentioned, the reader should be aware that the exceptions 
arising from this translation are call-by-name. This means that they do not 
behave like their usual call-by-value counterpart. In particular, we have in Jq 


raise (IIx : A.B) e = àz : A.raise Be 


which means that exceptions cannot be caught on l-types. We can catch 
them on universes and inductive types though, because in those cases they are 
freely added through an extra constructor which one can pattern-match on. For 
instance, there exists in Zg a term 


Catchyoo1 : WP: bool — O. P true —> P false —> 
(Ile : E. P (raise bool e)) > IIb: bool. P b 


defined by 


[catchpoo1] := AP pe pf pe b. match b return àb. El (P b) with 
| true? > p: 
| false? > py 
| boolg e => pe e 


end 


satisfying the expected reduction rules on all three cases. 
In Sect. 6, we illustrate the use of the exceptional theory using the COQ 
plugin to define a simple cast framework as in [12]. 
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ile := AA: [0]. [A] > O: 
s: AM), := Ma : [A]) (we : [A], £). [M]. 
M N), :=[M|, [N] [M]. 
Te: A. B), = A(f : Ta : [A]. [B]).I1(e : FAD) (ve : [A]; 2): [B]; (F 2) 
Ao =A 
J- i 
T,c: A), :=([C].,2: [A], ze : [A], £ 


Fig. 5. Parametricity over exceptional translation 


3 Kreisel Meets Martin-L6f 


It is well-known that Reynolds’ parametricity [13] and Kreisel’s modified realiz- 
ability [4] are two instances of the broader logical relation techniques. Usually, 
parametricity is used to derive theorems for free, while realizability constrains 
programs. In a surprising turn of events, we use Bernardy’s variant of para- 
metricity on CIC [5] as a realizability trick to evict undesirable behaviours of 
Tz. This leads to the parametric exceptional translation, which can be seen as 
the embodiment of Kreisel’s realizability in type theory. In this section, we first 
present this translation on the negative fragment, then extend it to CIC and 
finally discuss its meta-theoretical properties. 


3.1 Exceptional Parametricity in a Negative World 


The exceptional parametricity translation for terms of CC,, is defined in Fig. 5. 
Intuitively, any type A in Jg is turned into a validity predicate A, : A — O which 
encodes the fact that an inhabitant of A is not allowed to generate unhandled 
exceptions. For instance, a function is valid if its application to a valid term 
produces a valid answer. It does not say anything about the application to invalid 
terms though, which amounts to a garbage in, garbage out policy. The translation 
then states that every pure term is automatically valid. 

This translation is exactly standard parametricity for type theory [5] but 
parametrized by the exceptional translation. This means that any occurrence of 
a term of the original theory used in the parametricity translation is replaced 
by its exceptional translation, using [-] or [-] depending on whether it is used as 
a term or as a type. For instance, the translation of an application |M N], is 
given by [M]. [N] Uy]; instead of just [M]. N [N] 


c 


Lemma 18 (Substitution lemma). The translation satisfies the following 
conversion: |M{x := N}, = [M] {x := [N], ze := [N],}. 


Theorem 19 (Soundness). The two following properties hold. 


Failure is Not an Option 257 


- If M =N then [M]; = [N],. 


—~IfT+ M:A then [T], F [M]; : [A]. [M]. 


E€ 


Proof. By induction on the derivation. 


We can use this result to construct another syntactic model of CC,,. Contrar- 
ily to usual syntactic models where sequents are straightforwarldy translated to 
sequents, this model is slightly more subtle as sequents are translated to pairs 
of sequents instead. This is similar to the usual parametricity translation. 


Definition 20. The theory TẸ is defined by the following data. 


- Terms of Tg are pairs of terms of T. 

- Contexts of Tf are pairs of contexts of T. 

-— Erp T whenever Fr [L] and Hr [T]; 

- M =r; N whenever [M] =r [N] and [M], =r [N].. 

- T Fz M : A whenever [I] Fz [M] : [A] and [T]; Fr [M], : [A], M]. 


Once again, Theorem 19 can be rephrased in terms of preservation of theories 
and syntactic models. 


Theorem 21. If T interprets CC,, then so does TẸ. That is, the parametric 
exceptional translation is a syntactic model of CCa. 


This construction preserves definitional 7-expansion, as functions are mapped 
to (slightly more complicated) functions. 


Lemma 22. IfT satisfies definitional n-expansion, then so does TP. 


Proof. The first component of the translation preserves definitional 7-expansion 
because functions are mapped to functions. It remains to show that 


[Ax : A.M z]; := A(x : [A]) (ze : [A], x). [M]. 2 xe = [M] 


€ € 


which holds by applying 7-expansion twice. 


It is interesting to remark that Bernardy-style unary parametricity also leads 
to a syntactic model 7? that interprets CC,, (as well as CIC), using the same 
kind of glueing construction. Nonetheless, this model is somewhat degenerate 
from the logical point of view. Namely it is a conservative extension of the target 
theory. Indeed, if IT Fr» M : A for some T, M and A from 7, then there we also 
have T Fy M : A, because the first component of the model is the identity, and 
the original sequent can be retrieved by the first projection. 

This is definitely not the case with the J? theory, because the first projection 
is not the identity. In particular, because of Theorem 16, every sequent in the first 
projection is inhabited, although it is not the case in T itself if it is consistent. 
This means that parametricity can actually bring additional expressivity when 
it applies to a theory which is not pure, as it is the case here. 
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Ind bool: : bool’ > O := 

| true. : bool. true® 

| false. : bool. false® 

Ind liste (A: type) (Ae : [A] > O) :list® A > O := 

| nile : liste A Ae (nil® A) 

| conse : II(a: [A]) (£e : Ae x) (l: list? A) (le : liste A Ae L). 
liste A Ae (cons® A z l) 

Ind eq, (A: type) (Ae : [A] — O) (a: [A]) (£e : Ae x) : 


I(y : [A]) (ye : Ac y).eq® Axy >O: 
| refle : refle A Ae £ Ze £ £e (refl® A x) 


Fig. 6. Examples of parametric translation of inductive types 


3.2 Exceptional Parametric Translation of CIC 


We now describe the parametricity translation of the positive fragment. The 
intuition is that as it stands for an exception, the default constructor is always 
invalid, while all other constructors are valid, assuming their arguments are. 


Type and Constructor Translation 


Definition 23. Let T be an inductive type as given in Definition 6. We define 
the exceptional parametricity translation I. of T as the inductive type defined by: 


= parameters [pi : Pi,- -Pn : Prale; 


- indices [ty : T1, ... im : Im] @ : T Di tod Pn Wie im; 
— constructors 7 
Cle: Ija : Ail- 
Ze pı Pie ++» Pn Pne [Vi] [Vi]; -> [Vim] [Vim]; (ct Pã) 


Cke : Ija : Ax]. 
Te Pı Pie 22+ Pn Pne Var) Veale - Vagal Meme (C} P oe): 


and we extend the translation as 
l= Te [ci]: = Gre osu, [eal = Cre. 


Example 24. We give the exceptional parametric inductive translation of our 
running examples in Fig. 6. 


Note that contrarily to the negative case, the exceptional parametricity trans- 
lation on inductive types is not the same thing as the composition of Bernardy’s 
parametricity together with the exceptional translation. Indeed, the latter would 
also have produced a constructor for the default case from the exceptional induc- 
tive translation, whereas our goal is precisely to rule this case out via the addi- 
tional realizability-like interpretation. 
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It is also very different from our previous parametric weaning translation [2], 
which relies on internal parametricity to recover dependent elimination, enforcing 
by construction that no effectful term exists. Here, effectful terms may be used 
in the first component, but they are required after the fact to have no incon- 
sistent behaviour. Intuitively, parametric weaning produces one pure sequent, 
while exceptional parametricity produces two, with the first one being poten- 
tially impure and the second one assuring the first one is harmless. 


Pattern-Matching Translation 


Definition 25. Let Q be the pattern-matching defined in Definition 12. We pose 
[Q]; to be the pattern-matching 


match [M], return Ai : i}. (a: Z*° [Xi]... [Xn] tr... tm). 
(te : Ze [X1] [Xi]; <- [Xn] [Xn] i1 ire ... im ime x) 
[R]; [Q=] 
with 
| Cie @1,1 Gite +- @i Aye > [Ni], 
| Cke Gk1 Ake +-+ Akl, alipe => [Nr]; 
end 


where Q, is the following pattern-matching 


match x return \(i1: h)... (im: Im) (£:T Xı ... Xn ti... im). R with 


| C1 G11 +--+ Gil, => Ny 
| Ck Qk,1 --- Akl, = Nk 
end 


that is Q where the scrutinee has been turned into the index variable of the 
parametricity predicate. 


Lemma 26. With notations and typing assumptions from Definition 12, we 
have 


[C] H [Q]. : [R(= F, x = M}, [Q] 


The exceptional parametricity translation can be extended to handle fix- 
points as well, with a few limitations. Translating generic fixpoints uniformly 
is indeed an open problem in standard parametricity, and our variant faces the 
same issue. In practice, standard recursors can be automatically translated, and 
fancy fixpoints may require hand-writing the parametricity proof. We do not 
describe the recursor translation here though, as it is essentially the same as 
standard parametricity. Again, the interested reader may test the Coq plugin 
exposed in Sect. 6 to see how recursors are translated. 

Packing everything together allows to state the following result. 


Theorem 27. If T interprets CIC, then so does TẸ, and thus the exceptional 
parametricity translation is a syntactic model of CIC. 
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3.3 Meta-Theoretical Properties of 7? 


Being built as a syntactic model, 7? inherits a lot of meta-theoretical properties 
of T. We list a few of interest below. 


Theorem 28. If T is consistent, then so is TP. 


Proof. Assume Fz Mo : empty for some Mo. Then by definition, there exists 
two terms M and M, such that Fr M : empty® and Fy Me : empty, M. But 
empty, has no constructor, and T is inconsistent. 


More generally, the same argument holds for any inductive type. 


Theorem 29. If T enjoys canonicity, then so does TẸ. 


Proof. The exceptional parametricity translation for inductive types has the 
same structure as the original type, so any normal form in 7,” can be mapped 
back to a normal form in T. 


4 Effectively Extending CIC 


The parametric exceptional translation allows to extend the logical expressivity 
of CIC in the following ways, which we develop in the remainder of this section. 

We show in Sect. 4.1 that Markov’s rule is admissible in CIC. We already 
sketched this result in our previous paper [2], but we come back to it in more 
details. More generally, we show a form of conservativity of double-negation 
elimination over the type-theoretic version of II formulae. 

In Sect. 4.2, we exhibit a syntactic model of CIC which satisfies definitional 
n-expansion for functions but which negates function extensionality. As far as 
we know, this was not known. 

Finally, in Sect. 4.3, we show that there exists a model of CIC which validates 
the independence of premises. This is a new result, that shows that CIC can 
feature traces of classical reasoning while staying computational. We use this 
result in Sect. 4.4 to give an alternative proof of the recent result of Coquand 
and Mannaa [7| that Markov’s principle is not provable in CIC. 


4.1 Markov’s Rule 


We show in this section that CIC is closed under a generalized Markov’s rule. 
The technique used here is no more than a dependently-typed variant of Fried- 
man’s trick [14]. Indeed, Friedman’s A-translation amounts to add exceptions to 
intuitionistic logic, which is precisely what 7g does for CIC. 


Definition 30. An inductive type in CIC is said to be first-order if all the types 
of the arguments of its constructors, in its parameters and in its indices are 
recursively first-order. 
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Example 31. The empty, unit and N types are first-order. If P and Q are first- 
order then so is Hp : P.Q, P+ Q and eq P po pı. Consequently, the CIC 
equivalent of ©? formulae are in particular first-order. 


First-order types enjoy uncommon properties, like the fact that they can be 
injected into effectful terms and purified away. This is then used to prove the 
generalized Markov’s Rule. 


Lemma 32. For every first-order type p: PH Q : O where all P are first-order, 
there are retractions tp, Lo and Oz, OQ s.t.: 


P: Ph ig:Q = [QHP = op P) 
P: P Fog: IRHKP:= p p} ~ Q+E. 


Proof. The ı terms exist because effectful inductive types are a semantical super- 
set of their pure equivalent, and the 0 terms are implemented by recursively 
forcing the corresponding impure inductive term. One relies on decidability of 
equality of first-order type to fix the indices. 


Theorem 33 (Generalized Markov’s Rule). For any first-order type P and 
first-order predicate Q over P, if cic Ip : P.~~— (Q p) then Forc Hp: P.Q p. 


Proof. Let M : Ip: P.~- (Q p). By taking E := Q p and apply the soundness 
theorem, one gets a proof 


p: P [M] : Ip: [P]. ([Q p] — empty?) — empty”. 
But empty? S E = Q p, so we can derive from [M] a term M* s.t. 
p: PH M*: 1p: [P] (AQP - Qp+Qp) n. 


The proofterm we were looking for is thus no more than Ap: P. MË (up p) OQ. 


4.2 Function Intensionality with n-expansion 


In a previous paper [9], we already showed that there existed a syntactic model of 
CIC that allowed to internally disprove function extensionality. Yet, this model 
was clearly not preserving definitional 7-expansion on functions, as it was adding 
additional structure to abstraction and application (namely a boolean). Thanks 
to our new model, we can now demonstrate that counterintuitively, it is possible 
to have a consistent type theory that enjoys definitional 7-expansion while negat- 
ing internally function extensionality. In this section we suppose that E := unit, 
although any inhabited type of exceptions would work. 

By Lemma 22, we know that the parametric exceptional translation preserves 
definitional n-expansion. It is thus sufficient to find two functions that are exten- 
sionally equal but intensionally distinct in the model. Let us consider to this end 
the unit — unit functions 


id, := Au: unit. u idy := Au: unit. tt. 
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Theorem 34. The following sequents are derivable: 
Fre Iu : unit. id; u = idy u Fr? id, = idy — empty. 


Proof. The main difference between the two functions is that id, preserves 
exceptions while idy does not, which we exploit. 

The first sequent is provable in CIC by dependent elimination and thus is 
derivable in 7? by applying the soundness theorem. 

To prove the first component of the second sequent, we exhibit a prop- 
erty that discriminates [id,] and [idy], which is, as explained, their evaluation 
on the term unitg tt. Showing then that this proof is parametric is equiva- 
lent to showing II(p: [idı = idt]) (pe : [idı = idt], p). empty. But pe actu- 
ally implies [id,] = [idr], which we just showed was absurd. 


4.3 Independence of Premise 


Independence of premise (IP) is a semi-classical principle from first-order logic 
whose CIC equivalent can be stated as follows. 


(A :0)(B:N—-OD).(-A—> En: N.Bn)- Un: N.7AA—> Bn (IP) 


Although not derivable in intuitionistic logic, it is an admissible rule of HA. The 
standard proof of this property is to go through Kreisel’s modified realizability 
interpretation of HA [4]. In a nutshell, the interpretation goes as follows: by 
induction over a formula A, define a simple type 7(A) of realizers of A together 
with a realizability predicate - I- A over 7(A). Then show that whenever Fya A, 
there exists some simply-typed term t : r(A) s.t. t Ik A. As the interpretation 
also implies that there is no t s.t. t IF L, this gives a sound model of HA, which 
contains more than the latter. Most notably, there is for instance a term ip s.t. 


ip lk (7A > Jn. B) > In. ~A — B 


for any A, B. Intriguingly, the computational content of ip did not seem to 
receive a fair treatment in the literature. To the best of our knowledge, it has 
never been explicitly stated that IP was realizable because of the following “bug” 
of Kreisel’s modified realizability. 


Lemma 35 (Kreisel’s bug). For every formula A, T(A) is inhabited. In par- 
ticular, T(L) := unit. 


We show that this is actually not a bug, but a hidden feature of Kreisel’s 
modified realizability, which secretly allows to encode exceptions in the realizers. 
To this end, we implement IP in 7? by relying internally on paraproofs, i.e. 
terms raising exceptions, while ensuring these exceptions never escape outside 
of the locally unsafe boundary. The resulting J,P term has essentially the same 
computational content as its Kreisel’s realizability counterpart. In this section 
we suppose that E := unit, although assuming E to be inhabited is sufficient. 

To ease the understanding of the definition, we rely on effectful combinators 
that can be defined in Tg. 
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Definition 36. We define in Tg the following terms. 


fail : HA:O.A 
[fail] := AA: [O]. [A], tt 


iss : IIAB.(2: A.B) — bool isn : N-— bool 
[iss] := AA Bp. match p with isn] := fix isn n := match n with 
| ex? _ _ => true® | O° => true® 
| Uo _ => false® | S° n> isn n 
end | No _ > false? 
end 


It is worth insisting that these combinators are not necessarily parametric. 
While it can be shown that iss and isy actually are, fail is luckily not. The 
isy and isy functions are used in order to check that a value is actually pure 
and does not contain exceptions. 


Definition 37. We define ip in Tp in direct style below, using the available 
combinators from Definition 86 and a bit of syntactic sugar. 
ip: IP 
ip:=A(A:0)(B:N->D)(f:7A— Un: N.B n). 
let p := f (fail (4A)) in 
if isy N B p then match p with 
| exnb = if isn n then ex _ _ n (A_:7A.b) 
else ex 0 (fail (~A > B 0)) 


end else ex _ _ 0 (fail (~A — B 0)) 


The intuition behind this term is the following. Given f : ~A —> En: N.B n, 
we apply it to a dummy function which fails whenever it is used. Owing to the 
semantics of negation, we know in the parametricity layer that the only way 
for this application to return an exception is that f actually contained a proof 
of A and applied fail to it. Therefore, given a true proof of ~A, we are in an 
inconsistent setting and thus we are able to do whatever pleases us. The issue 
is that we do not have access to such a proof yet, and we do have to provide 
a valid integer now. Therefore, we check whether f actually provided us with a 
valid pair containing a valid integer. If so, this is our answer, otherwise we stuff 
a dummy integer value and we postpone the contradiction. 

This is essentially the same realizer as the one from Kreisel’s modified real- 
izability, except that we have a fancy type system for realizers. In particular, 
because we have dependent types, integers also exist in the logical layer, so that 
they need to be checked for exceptions as well. The only thing that remains to 
be proved is that ip also lives in ZF. 


Theorem 38. There is a proof of Fz [IP]. [ip]. 


Proof. The proof is straightforward but tedious, so we do not give the full details. 
The file IPc.v of the companion Coq plugin contains an explicit proof. The 
essential properties that make it go through are the following. 
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— Fr I(n: N°) (pi po: Ne n). pı = p2 
— Fr In: N°. [isn] n = true? e Ne n 
~ Fr Ip: [A]. 4]. p > FA. a 


Corollary 39. We have Erp IP. 


4.4 Non-provability of Markov’s Principle 


From this result, one can get a very easy syntactic proof of the independence 
result of Markov’s principle from CIC. Markov’s principle is usually stated as 


IIP : N > bool. => (Xn : N. P n = true) > Xn : N.P n = true (MP) 


An independence result was recently proved by Coquand and Mannaa by a 
semantic argument [7]. We leverage instead a property from realizability [15] 
that has been applied to type theory the other way around by Herbelin [16]. 


Lemma 40. [fS is a computable theory containing CIC and enjoying canonic- 
ity, then one cannot have both Fs IP and Fs MP. 


Proof. By applying IP to MP, one easily obtains that 
Fs TP: N > bool. Xn : N. IIm : N. P m= true — P n= true. 


Thus, for every closed P : N — bool, by canonicity there exists a closed np : N 
s.t. Hs Im : N. P m = true — P np = true. But then one can decide whether 
P holds for some n by just computing P np, so that we effectively obtained an 
oracle deciding the halting problem (which is expressible in CIC). 


Corollary 41. We have /cyce MP and thus also oro MP. 


5 Possible Extensions 


5.1 Negative Records 


Interestingly, the fact that the translation introduces effects has unintented con- 
sequences on a few properties of type theory that are often taken for granted. 
Namely, because type theory is pure, there is a widespread confusion amongst 
type theorists between positive tuples and negative records. 


— Positive tuples are defined as a one-constructor inductive type, introduced by 
this constructor and eliminated by pattern-matching. They do not (and in 
general cannot, for typing reasons) satisfy definitional 7-laws, also known as 
surjective pairing. 

— Negative records are defined as a record type, introduced by primitive packing 
and eliminated by projections. They naturally obey definitional 7-laws. 
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A,B,M,N :=... | &z: A. B | (M,N) | M.m | M.r2 
TFKA:O; T,sz: AF B:0; THM: &z: AB TEM:&x#:A.B 
LPer: A.B: Omax(i,j) T+ M.m:A [TE M.ne: Bla := M.m} 
TEKM:A T,c: AFB: TEN: B{x := M} 


TE(M,N):&«: A.B 


(M.m, M.m2) = M (M,N).m=M (M, N).m2 = N 


Fig. 7. Negative pairs 
[&a : A. B] := TypeVal (&a : [A]. [B]) (Ae: E. ([A] 
KM, N)] := ([M],[N]) 
[M.ni] := [M].m 


o& [B] o {£ = [A], e} e)) 


Fig. 8. Exceptional translation of negative pairs 


In the remainder of this section, we will focus on the specific case of pairs, but the 
same arguments are generalizable to arbitrary records. Positive pairs Ux: A. B 
are defined by the inductive type from Fig.4. Negative pairs &« : A.B are 
defined as a primitive structure in Fig.7. We use the ampersand notation as a 
reference to linear logic. 

In CIC, it is possible to show that negative and positive pairs are proposition- 
ally isomorphic, because positive pairs enjoy dependent elimination. Nonethe- 
less, it is a well-known fact in the programming folklore that in a call-by-name 
language with effects, the two are sharply distinct. For instance, in presence of 
exceptions, assuming | M : Sa: A. B, one does not have in general 


M = ex AB (fst A B M) (snd A B M) 


where fst and snd are defined by pattern-matching. Indeed, if M is itself an 
exception, the two sides can be discriminated by a pattern-matching. Match- 
ing on the left-hand side results in immediate reraising of the exception, while 
matching on the right-hand side succeeds as long as the arguments of the con- 
structor are not forced. Forcefully equating those two terms would then result 
in a trivial equational theory. 

Such a phenomenon is at work in the exceptional translation. It is actually 
possible to interpret negative pairs through the translation, but in a way that 
significantly differs from the translation of positive pairs. In this section, we 
assume that 7 contains negative pairs. 


Definition 42. The translation of negative pairs is given in Fig. 8. 
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It is straightforward to check that the definitions of Fig.8 preserve the con- 
version and typing rules from Fig. 7. The same translation can be extended to 
any record. We thus have the following theorem. 


Theorem 43. If T has negative records, then so has Tg. 


It is enlightening to look at the difference between negative and positive pairs 
through the translation, because now we have effects that allow to separate them 
clearly. Indeed, compare 


[&a: A. B] =&«: [A]. [B] with [Ev : A.B] SE + dz: [A]. [B]. 


Clearly, if E is inhabited, then the two types do not even have the same cardi- 
nal, assuming A and B are finite. Furthermore, their default inhabitant is not 
the same at all. It is defined pointwise for negative pairs, while it is a special 
constructor for positive ones. Finally, there is obviously not any chance that 
[=z : A. B] satisfies definitional surjective pairing in vanilla CIC, as it has two 
constructors. The trick is that the two types are externally distinguishable, but 
are not internally so, because Tg is a model of CIC+& and thus proves that they 
are propositionally isomorphic. 

It is possible to equip negative pairs with a parametricity relation defined 
as a primitive record which is the pointwise parametricity relation of each field, 
which naturally preserve typing and conversion rules. 


Theorem 44. If T has negative records, then so has TP. 


5.2 Impredicative Universe 


All the systems we have considered so far are predicative. It is nonetheless pos- 
sible to implement an impredicative universe x in Tg if 7 features one. 

Intuitively, it is sufficient to ask for an inductive type prop living in O; 
for all i, which is defined just as type, except that its constructor PropVal 
corresponding to TypeVal contains elements of x rather than O. Then one can 
similarly define El, and Err, acting on prop rather than type. One then slightly 
tweaks the [+] macro from Fig.2 by defining it instead as 


El, [A] if A: * 
aan E AE A 
El [A] otherwise 


and similarly for type constructors. With this modified translation, one obtains 
a soundness theorem for CC,,. 


Theorem 45. The exceptional translation is a syntactic model of CC, + x. 


Likewise, the inductive translation is amenable to interpret an impredicative 
universe, with one major restriction though. 


Theorem 46. The exceptional translation is a syntactic model of CIC + * with- 
out the singleton elimination rule. 
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Indeed, the addition of the default constructor disrupts the singleton elimi- 
nation criterion for all inductive types. Actually, this criterion is very fragile, and 
even if Tg satisfied it, Keller and Lasson showed that the parametricity trans- 
lation could not interpret inductive types in * for similar reasons [17], and ZË 
would face the same issue. 


6 The Exceptional Translation in Practice 


6.1 Implementation as a Coq Plugin 


The (parametric) exceptional translation is a translation of CIC into itself, which 
means that we can directly implement it as a Coq plugin. This way, we can 
use the translation to extend safely CoQ with new logical principles, so that 
typechecking remains decidable. 

Such a Cog plugin is simply a program that, given a COQ proof term M, 
produces the translations [M] and [M], as Coq terms. For instance, the trans- 
lations of type list, given in Figs. 4 and 6, are obtained by typing the following 
commands, which define each one new inductive type in Coq. 


Effect Translate list. 
Parametricity Translate list. 


The first command produces only [list], while the second produces [list].. But 
the main interest of the translation is that we can exhibit new constructors. For 
instance, the raise operation described in Sect. 2.4 is defined as 


Effect Definition Exception : Type :— fun E > TypeVal EE id. 
Effect Definition raise : V A, Exception — A := fun E (A: type E) > Err A. 


6.2 Usecase: A Cast Framework 


We can use the ability to raise exception to define partial function in the excep- 
tional layer. For instance, given a decidable property (described by the type 
class below), it is then possible to define a cast function from A to Ð (a: A). Pa 
returning the converted value if the property is satisfied and raising an exception 
otherwise (using an inhabitant cast_failed of Exception). 


Class Decidable (A : Type) := dec : A + (not A). 
Definition cast A (P : A — Type) (a:A) {Hdec : Decidable (P a)}: U(a:A).Pa 
:= match dec (P a) with 

| inl p = (a; p) 

| inr _ => raise cast_failed 

end. 


Using this cast mechanism, it is easy to define a function list_to_ pair from 
lists to pairs by first converting the list into a list size two, using the impure func- 
tion cast (list A) (fun 1 > List.length 1 = 2) and then recovering a pair from a 
list of size two using a pure function. 

In the exceptional layer, it is possible to prove the following property 
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Definition list_to_ pair prop A (xy: A): list_to_pair [x ; y] = (x,y). 


in at least two way. One can perfectly prove it by simply raising an exception 
at top level, or by reflexivity—using the fact that list_to_ pair [x ; y| actually 
reduces to (x,y). 

However, there is a way to distinguish between those two proofs in the target 
theory, here COQ, by stating the following lemma which can only proven for the 
proof not raising an exception. 


Definition list_to pair prop_soundness Axy: 
list_to pair prop* Ax y =eq_refl® ___ := eq_refl_. 


where underscores represent arguments inferred by COQ. 


7 Related Work 


Adding Dependency to an Effectful Language. There are numerous works on 
adding dependent types in mainstream effectful programming languages. They 
all mostly focused on how to appropriately restrict effectful terms from appearing 
in types. Indeed, if types only depend on pure terms, the problem of having 
two different evaluations of the effect of the term (at the level of types and 
at the level of terms) disappear. This is the case for instance for Dependent 
ML of Xi and Pfenning [18], or more recently for Casinghino et al. [19] on 
how to combine proofs and programs when programs can be non-terminating. 
The F* programming language of Swamy et al. [20] uses a notion of primitive 
effects including state, exceptions, divergence and IO. Each effect is described 
through a monadic predicate transformer semantics which allows to have a pure 
core dependent language to reason on those effects. On a more foundational 
side, there are two recent and overlapping lines of work on the description of 
a dependent call-by-push-value (CBPV) by Ahman et al. [21] and Vákár [22]. 
Those works also use a purity restriction for dependency, but using the CBPV 
language, deals with any effect described in monadic style. On another line of 
work, Brady advocates for the use of algebraic effects as an elegant way to allow 
combing effects more smoothly than with a monadic approach and gives an 
implementation in Idris [23]. 


Adding Effects to a Dependently- Typed Language. Nanevski et al. [24] have devel- 
oped Hoare type theory (HTT) to extend CoQ with monadic style effects. To 
this end, they provide an axiomatic extension of COQ with a monad in which 
to encapsulate imperative code. Important tools have been developed on HTT, 
most notably the Ynot project [25]. Apart from being axiomatic, their monadic 
approach does not allow to mix effectful programs and dependency but is rather 
made for proving inside COQ properties on simply typed imperative programs. 


Internal Translation of Type Theory. A non-axiomatic way to extend type theory 
with new features is to use internal translation, that is translation of type theory 
into itself as advocated by Boulier et al. [9]. The presentation of parametricity 
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for type theory given by Bernardy and Lasson [5] can be seen as one of the 
first internal translation of type theory. However, this one does not add any new 
power to type theory as it is a conservative extension. Barthe et al. [26] have 
described a CPS translation for CC,, featuring call-cc, but without dealing 
with inductive types and relying on a form of type stratification. A variant of 
this translation has been extended recently by Bowman et al. [27| to dependent 
sums using answer-type polymorphism Ha: O. (A — a) > a. A generic class of 
internal translations has been defined by Jaber et al. [28] using forcing, which can 
be seen as a type theoretic version of the presheaf construction used in categorical 
logic. This class of translation works on all CIC but for a restricted version of 
dependent elimination, identical to the Baclofen type theory [2]. Therefore, to the 
best of our knowledge, the exceptional translation is the first complete internal 
translation of CIC adding a particular notion of effect. 


8 Conclusion and Future Work 


In this paper, we have defined the exceptional translation, the first syntactic 
translation of the Calculus of Inductive Constructions into itself, adding effects 
and that covers full dependent elimination. This results in a new type the- 
ory, which features call-by-name exceptions with decidable type-checking and 
a weaker form of canonicity. We have shown that although the resulting theory 
is inconsistent, it is possible to reason on exceptional programs and show that 
some of them actually never raise an exception by relying on the target theory. 
This provides a sound logical framework allowing to transparently prove safety 
properties about impure dependently-typed programs. Then, using parametric- 
ity, we have given an additional layer at the top of the exceptional translation 
in order to tame exceptions and preserve consistency. This way, we have consis- 
tently extended the logical expressivity of CIC with independence of premises, 
Markov’s rule, and the negation of function extensionality while retaining n- 
expansion. Both translations have been implemented in a COQ plugin, which we 
use to formalize the examples. 

One of the main directions of future work is to investigate whether other kind 
of effects can give rise to an internal translation of CIC. To that end, it seems 
promising to look at algebraic presentation of effects. Indeed, the recent work on 
the non-necessity of the value restriction policy for algebraic effects and handlers 
of Kammar and Pretnar [29] suggests that we should be able to perform similar 
translations on CIC with full dependent elimination for other algebraic effects 
and handlers than exceptions. 
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Abstract. Bi-directional type checking has proved to be an extremely 
useful and versatile tool for type checking and type inference. The con- 
ventional presentation of bi-directional type checking consists of two 
modes: inference mode and checked mode. In traditional bi-directional 
type-checking, type annotations are used to guide (via the checked 
mode) the type inference/checking procedure to determine the type of 
an expression, and type information flows from functions to arguments. 
This paper presents a variant of bi-directional type checking where the 
type information flows from arguments to functions. This variant retains 
the inference mode, but adds aso-called application mode. Such design can 
remove annotations that basic bi-directional type checking cannot, and is 
useful when type information from arguments is required to type-check 
the functions being applied. We present two applications and develop the 
meta-theory (mostly verified in Coq) of the application mode. 


1 Introduction 


Bi-directional type checking has been known in the folklore of type systems for 
a long time. It was popularized by Pierce and Turner’s work on local type infer- 
ence [29]. Local type inference was introduced as an alternative to Hindley-Milner 
(henceforth HM system) type systems [11,17], which could easily deal with poly- 
morphic languages with subtyping. Bi-directional type checking is one component 
of local type inference that, aided by some type annotations, enables type infer- 
ence in an expressive language with polymorphism and subtyping. Since Pierce and 
Turner’s work, various other authors have proved the effectiveness of bi-directional 
type checking in several other settings, including many different systems with sub- 
typing [12,14, 15], systems with dependent types [2,3, 10, 21,37], and various other 
works [1,7, 13, 22,28]. Furthermore, bi-directional type checking has also been com- 
bined with HM-style techniques for providing type inference in the presence of 
higher-ranked types [14, 27]. 

The key idea in bi-directional type checking is simple. In its basic form typing is 
split into inference and checked modes. The most salient feature of a bi-directional 
type-checker is when information deduced from inference mode is used to guide 
checking of an expression in checked mode. One of such interactions between modes 
happens in the typing rule for function applications: 


Tre > A-B Fre < A 
Tree > B 


APP 
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In the above rule, which is a standard bi-directional rule for checking applica- 
tions, the two modes are used. First we synthesize (=) the type A > B from e, 
and then check (<=) ez against A, returning B as the type for the application. 

This paper presents a variant of bi-directional type checking that employs a 
so-called application mode. With the application mode the design of the appli- 
cation rule (for a simply typed calculus) is as follows: 


Fre => A ri% AFe > A-B 
IiWFee > B 


APP 


In this rule, there are two kinds of judgments. The first judgment is just the 
usual inference mode, which is used to infer the type of the argument e2. The 
second judgment, the application mode, is similar to the inference mode, but it 
has an additional context ¥. The context W is a stack that tracks the types of 
the arguments of outer applications. In the rule for application, the type of the 
argument ev is inferred first, and then pushed into W for inferring the type of e1. 
Applications are themselves in the application mode, since they can be in the 
context of an outer application. With the application mode it is possible to infer 
the type for expressions such as (Ax. x) 1 without additional annotations. 

Bi-directional type checking with an application mode may still require type 
annotations and it gives different trade-offs with respect to the checked mode 
in terms of type annotations. However the different trade-offs open paths to 
different designs of type checking/inference algorithms. To illustrate the utility 
of the application mode, we present two different calculi as applications. The 
first calculus is a higher ranked implicit polymorphic type system, which infers 
higher-ranked types, generalizes the HM type system, and has polymorphic let 
as syntactic sugar. As far as we are aware, no previous work enables an HM-style 
let construct to be expressed as syntactic sugar. For this calculus many results 
are proved using the Coq proof assistant [9], including type-safety. Moreover a 
sound and complete algorithmic system, inspired by Peyton Jones et al. [27], 
is also developed. A second calculus with explicit polymorphism illustrates how 
the application mode is compatible with type applications, and how it adds 
expressiveness by enabling an encoding of type declarations in a System-F-like 
calculus. For this calculus, all proofs (including type soundness), are mechanized 
in Coq. 

We believe that, similarly to standard bi-directional type checking, bi- 
directional type checking with an application mode can be applied to a wide 
range of type systems. Our work shows two particular and non-trivial applica- 
tions. Other potential areas of applications are other type systems with subtyp- 
ing, static overloading, implicit parameters or dependent types. 
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In summary the contributions of this paper aret: 


— A variant of bi-directional type checking where the inference mode 
is combined with a new, so-called, application mode. The application mode 
naturally propagates type information from arguments to the functions. 

— A new design for type inference of higher-ranked types which general- 
izes the HM type system, supports a polymorphic let as syntactic sugar, and 
infers higher rank types. We present a syntax-directed specification, an elab- 
oration semantics to System F, some meta-theory in Coq, and an algorithmic 
type system with completeness and soundness proofs. 

— A System-F-like calculus as a theoretical response to the challenge noted 
by Pierce and Turner [29]. It shows that the application mode is compatible 
with type applications, which also enables encoding type declarations. We 
present a type system and meta-theory, including proofs of type safety and 
uniqueness of typing in Coq. 


2 Overview 


2.1 Background: Bi-directional Type Checking 


Traditional type checking rules can be heavyweight on annotations, in the sense 
that lambda-bound variables always need explicit annotations. Bi-directional type 
checking [29] provides an alternative, which allows types to propagate downward 
the syntax tree. For example, in the expression (Af : Int — Int. f) (Ay. y), the type 
of y is provided by the type annotation on f. This is supported by the bi-directional 
typing rule for applications: 


Fre > A-B Fre <A 
Tree > B 


APP 


Specifically, if we know that the type of e; is a function from A — B, we can check 
that e2 has type A. Notice that here the type information flows from functions 
to arguments. 

One guideline for designing bi-directional type checking rules [15] is to dis- 
tinguish introduction rules from elimination rules. Constructs which correspond 
to introduction forms are checked against a given type, while constructs cor- 
responding to elimination forms infer (or synthesize) their types. For instance, 
under this design principle, the introduction rule for pairs is supposed to be in 
checked mode, as in the rule PAIR-C. 

Fre <A Fre < B Fre sA Fre > B 


Tiaare UR Preas An 


1 All supplementary materials are available in https: //bitbucket.org/ningningxie/let- 
arguments- go-first. 
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Unfortunately, this means that the trivial program (1, 2) cannot type-check, 
which in this case has to be rewritten to (1, 2) : (Int , Int). 

In this particular case, bi-directional type checking goes against its original 
intention of removing burden from programmers, since a seemingly unnecessary 
annotation is needed. Therefore, in practice, bi-directional type systems do not 
strictly follow the guideline, and usually have additional inference rules for the 
introduction form of constructs. For pairs, the corresponding rule is PAIR-I. 

Now we can type check (1, 2), but the price to pay is that two typing rules 
for pairs are needed. Worse still, the same criticism applies to other constructs. 
This shows one drawback of bi-directional type checking: often to minimize anno- 
tations, many rules are duplicated for having both inference and checked mode, 
which scales up with the typing rules in a type system. 


2.2 Bi-directional Type Checking with the Application Mode 


We propose a variant of bi-directional type checking with a new application mode. 
The application mode preserves the advantage of bi-directional type checking, 
namely many redundant annotations are removed, while certain programs can 
type check with even fewer annotations. Also, with our proposal, the inference 
mode is a special case of the application mode, so it does not produce dupli- 
cations of rules in the type system. Additionally, the checked mode can still be 
easily combined into the system (see Sect. 5.1 for details). The essential idea of 
the application mode is to enable the type information flow in applications to 
propagate from arguments to functions (instead of from functions to arguments 
as in traditional bi-directional type checking). 

To motivate the design of bi-directional type checking with an application 
mode, consider the simple expression 


(Ax. x) 1 


This expression cannot type check in traditional bi-directional type checking 
because unannotated abstractions only have a checked mode, so annotations are 
required. For example, ((Ax. x) : Int — Int) 1. 

In this example we can observe that if the type of the argument is accounted 
for in inferring the type of Ax. x, then it is actually possible to deduce that the 
lambda expression has type Int — Int , from the argument 1. 


The Application Mode. If types flow from the arguments to the function, an 
alternative idea is to push the type of the arguments into the typing of the 
function, as the rule that is briefly introduced in Sect. 1: 


Fre => A ri% AFe > A-B 
Div Fe e > B 


APP 
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Here the argument e2 synthesizes its type A, which then is pushed into the 
application context ¥. Lambda expressions can now make use of the application 
context, leading to the following rule: 


I,c:AiWFe > B 
Di Y,AF z.e > A-B 


LAM 


The type A that appears last in the application context serves as the type for x, 
and type checking continues with a smaller application context and x:A in the 
typing context. Therefore, using the rule APP and LAM, the expression (Ax. x) 1 
can type-check without annotations, since the type Int of the argument 1 is used 
as the type of the binding x. 

Note that, since the examples so far are based on simple types, obviously 
they can be solved by integrating type inference and relying on techniques like 
unification or constraint solving. However, here the point is that the application 
mode helps to reduce the number of annotations without requiring such sophis- 
ticated techniques. Also, the application mode helps with situations where those 
techniques cannot be easily applied, such as type systems with subtyping. 


Interpretation of the Application Mode. As we have seen, the guideline for design- 
ing bi-directional type checking [15], based on introduction and elimination rules, 
is often not enough in practice. This leads to extra introduction rules in the 
inference mode. The application mode does not distinguish between introduc- 
tion rules and elimination rules. Instead, to decide whether a rule should be in 
inference or application mode, we need to think whether the expression can be 
applied or not. Variables, lambda expressions and applications are all examples 
of expressions that can be applied, and they should have application mode rules. 
However pairs or literals cannot be applied and should have inference rules. For 
example, type checking pairs would simply lead to the rule PAIR-I. Neverthe- 
less elimination rules of pairs could have non-empty application contexts (see 
Sect. 5.2 for details). In the application mode, arguments are always inferred 
first in applications and propagated through application contexts. An empty 
application context means that an expression is not being applied to anything, 
which allows us to model the inference mode as a particular case?. 


Partial Type Checking. The inference mode synthesizes the type of an expression, 
and the checked mode checks an expression against some type. A natural question 
is how do these modes compare to application mode. An answer is that, in some 
sense: the application mode is stronger than inference mode, but weaker than 
checked mode. Specifically, the inference mode means that we know nothing 
about the type an expression before hand. The checked mode means that the 
whole type of the expression is already known before hand. With the application 
mode we know some partial type information about the type of an expression: 


? Although the application mode generalizes the inference mode, we refer to them as 
two different modes. Thus the variant of bi-directional type checking in this paper 
is interpreted as a type system with both inference and application modes. 
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we know some of its argument types (since it must be a function type when the 
application context is non-empty), but not the return type. 

Instead of nothing or all, this partialness gives us a finer grain notion on 
how much we know about the type of an expression. For example, assume 
e: A— B — C. In the inference mode, we only have e. In the checked mode, we 
have both e and A — B — C. In the application mode, we have e, and maybe 
an empty context (which degenerates into inference mode), or an application 
context A (we know the type of first argument), or an application context B, A 
(we know the types of both arguments). 


Trade-offs. Note that the application mode is not conservative over traditional 
bidirectional type checking due to the different information flow. However, it 
provides a new design choice for type inference/checking algorithms, especially 
for those where the information about arguments is useful. Therefore we next 
discuss some benefits of the application mode for two interesting cases where 
functions are either variables; or lambda (or type) abstractions. 


2.3 Benefits of Information Flowing from Arguments to Functions 


Local Constraint Solver for Function Variables. Many type systems, including 
type systems with implicit polymorphism and/or static overloading, need infor- 
mation about the types of the arguments when type checking function variables. 
For example, in conventional functional languages with implicit polymorphism, 
function calls such as (id 3) where id: Va. (a — a), are pervasive. In such a 
function call the type system must instantiate a to Int. Dealing with such implicit 
instantiation gets trickier in systems with higher-ranked types. For example, 
Peyton Jones et al. [27] require additional syntactic forms and relations, whereas 
Dunfield and Krishnaswami [14] add a special purpose application judgment. 

With the application mode, all the type information about the arguments 
being applied is available in application contexts and can be used to solve instan- 
tiation constraints. To exploit such information, the type system employs a spe- 
cial subtyping judgment called application subtyping, with the form WF A < B. 
Unlike conventional subtyping, computationally W and A are interpreted as 
inputs and B as output. In above example, we have that Int Va.a => a < B 
and we can determine that a = Int and B = Int — Int. In this way, type sys- 
tem is able to solve the constraints locally according to the application contexts 
since we no longer need to propagate the instantiation constraints to the typing 
process. 


Declaration Desugaring for Lambda Abstractions. An interesting consequence of 
the usage of an application mode is that it enables the following let sugar: 
let x = ey in e2 ~ (Ax. e2) & 


Such syntactic sugar for let is, of course, standard. However, in the context of 
implementations of typed languages it normally requires extra type annotations 
or a more sophisticated type-directed translation. Type checking (Ax. e2) e1 
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would normally require annotations (for example an annotation for x), or other- 
wise such annotation should be inferred first. Nevertheless, with the application 
mode no extra annotations/inference is required, since from the type of the 
argument e; it is possible to deduce the type of x. Generally speaking, with the 
application mode annotations are never needed for applied lambdas. Thus let 
can be the usual sugar from the untyped lambda calculus, including HM-style 
let expression and even type declarations. 


2.4 Application 1: Type Inference of Higher-Ranked Types 


As a first illustration of the utility of the application mode, we present a calculus 
with implicit predicative higher-ranked polymorphism. 


Higher-Ranked Types. Type systems with higher-ranked types generalize the tra- 
ditional HM type system, and are useful in practice in languages like Haskell or 
other ML-like languages. Essentially higher-ranked types enable much of the 
expressive power of System F, with the advantage of implicit polymorphism. 
Complete type inference for System F is known to be undecidable [36]. There- 
fore, several partial type inference algorithms, exploiting additional type anno- 
tations, have been proposed in the past instead [15,25,27,31]. 


Higher-Ranked Types and Bi-directional Type Checking. Bi-directional type 
checking is also used to help with the inference of higher-ranked types [14,27]. 
Consider the following program: 


(Af. (£ 1, f£ °c?)) (Ax. x) 


which is not typeable under those type systems because they fail to infer the type 
of f, since it is supposed to be polymorphic. Using bi-directional type checking, 
we can rewrite this program as 


((Af. (£ 1, £ °c?)) : (Va. a —> a) — (Int, Char)) (Ax . x) 


Here the type of f can be easily derived from the type signature using checked 
mode in bi-directional type checking. However, although some redundant annota- 
tions are removed by bi-directional type checking, the burden of inferring higher- 
ranked types is still carried by programmers: they are forced to add polymor- 
phic annotations to help with the type derivation of higher-ranked types. For 
the above example, the type annotation is still provided by programmers, even 
though the necessary type information can be derived intuitively without any 
annotations: f is applied to Ax. x, which is of type Va. a —> a. 


Generalization. Generalization is famous for its application in let polymorphism 
in the HM system, where generalization is adopted at let bindings. Let polymor- 
phism is a useful component to introduce top-level quantifiers (rank 1 types) 
into a polymorphic type system. The previous example becomes typeable in the 
HM system if we rewrite it to: let f = Ax. x in (f 1, f °c’). 
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Type Inference for Higher-Ranked Types with the Application Mode. Using our 
bi-directional type system with an application mode, the original expression can 
type check without annotations or rewrites: (Af. (£ 1, f ’c’)) (Ax. x). 

This result comes naturally if we allow type information flow from arguments 
to functions. For inferring polymorphic types for arguments, we use generaliza- 
tion. In the above example, we first infer the type Va. a — a for the argument, 
then pass the type to the function. A nice consequence of such an approach 
is that HM-style polymorphic let expressions are simply regarded as syntactic 
sugar to a combination of lambda/application: 

let x = e in eg ~ (Ax. e2) & 


With this approach, nested lets can lead to types which are more general 
than HM. For example, let s = Ax. x in let t = Ay. s in e. The type of s is 
Va. a — a after generalization. Because t returns s as a result, we might expect 
t: Vb. b > (Va. a — a), which is what our system will return. However, HM 
will return type t: Vb. Va. b — (a — a), as it can only return rank 1 types, 
which is less general than the previous one according to Odersky and Laufer’s 
subtyping relation for polymorphic types [24]. 


Conservativity over the Hindley-Milner Type System. Our type system is a con- 
servative extension over the Hindley-Milner type system, in the sense that every 
program that can type-check in HM is accepted in our type system, which is 
explained in detail in Sect.3.2. This result is not surprising: after desugaring let 
into a lambda and an application, programs remain typeable. 


Comparing Predicative Higher-Ranked Type Inference Systems. We will give a 
full discussion and comparison of related work in Sect. 6. Among those works, we 
believe the work by Dunfield and Krishnaswami [14], and the work by Peyton 
Jones et al. [27] are the most closely related work to our system. Both their 
systems and ours are based on a predicative type system: universal quantifiers 
can only be instantiated by monotypes. So we would like to emphasize our sys- 
tem’s properties in relation to those works. In particular, here we discuss two 
interesting differences, and also briefly (and informally) discuss how the works 
compare in terms of expressiveness. 


(1) Inference of higher-ranked types. In both works, every polymorphic type 
inferred by the system must correspond to one annotation provided by 
the programmer. However, in our system, some higher-ranked types can be 
inferred from the expression itself without any annotation. The motivating 
expression above provides an example of this. 

(2) Where are annotations needed? Since type annotations are useful for 
inferring higher rank types, a clear answer to the question where annotations 
are needed is necessary so that programmers know when they are required to 
write annotations. To this question, previous systems give a concrete answer: 
only on the binding of polymorphic types. Our answer is slightly different: only 
on the bindings of polymorphic types in abstractions that are not applied to 
arguments. Roughly speaking this means that our system ends up with fewer 
or smaller annotations. 
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(3) Expressiveness. Based on these two answers, it may seem that our 
system should accept all expressions that are typeable in their system. 
However, this is not true because the application mode is not conser- 
vative over traditional bi-directional type checking. Consider the expres- 
sion (Af : (Va. a —> a) — (Int, Char). f£) (ag. (g 1, g ’a’)), which is 
typeable in their system. In this case, even if g is a polymorphic binding with- 
out a type annotation the expression can still type-check. This is because the 
original application rule propagates the information from the outer binding 
into the inner expressions. Note that the fact that such expression type-checks 
does not contradict their guideline of providing type annotations for every 
polymorphic binder. Programmers that strictly follow their guideline can still 
add a polymorphic type annotation for g. However it does mean that it is a 
little harder to understand where annotations for polymorphic binders can 
be omitted in their system. This requires understanding how the applications 
in checked mode operate. 

In our system the above expression is not typeable, as a consequence of 
the information flow in the application mode. However, following our guideline 
for annotations leads to a program that can be type-checked with a smaller 
annotation: (Af. f) (Ag : (Va. a > a). (g 1, g ’a’)). This means that 
our work is not conservative over their work, which is due to the design choice 
of the application typing rule. Nevertheless, we can always rewrite programs 
using our guideline, which often leads to fewer/smaller annotations. 


2.5 Application 2: More Expressive Type Applications 


The design choice of propagating arguments to functions was subject to consid- 
eration in the original work on local type inference [29], but was rejected due to 
possible non-determinism introduced by explicit type applications: 


“It is possible, of course, to come up with examples where it would be 
beneficial to synthesize the argument types first and then use the resulting 
information to avoid type annotations in the function part of an application 
expression.... Unfortunately this refinement does not help infer the type of 
polymorphic functions. For example, we cannot uniquely determine the 
type of x in the expression (fun[X](x) e) [Int] 3.” [29] 


Therefore, as a response to this challenge, our second application is a variant 
of System F. Our development of the calculus shows that the application mode 
can actually work well with calculi with explicit type applications. To explain 
the new design, consider the expression: 


(Aa. Ax : a. x + 1) Int 


which is not typeable in the traditional type system for System F. In System F 
the lambda abstractions do not account for the context of possible function appli- 
cations. Therefore when type checking the inner body of the lambda abstrac- 
tion, the expression x + 1 is ill-typed, because all that is known is that x has the 
(abstract) type a. 
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If we are allowed to propagate type information from arguments to func- 
tions, then we can verify that a = Int and x + 1 is well-typed. The key insight 
in the new type system is to use application contexts to track type equalities 
induced by type applications. This enables us to type check expressions such 
as the body of the lambda above (x + 1). Therefore, back to the problematic 
expression (fun[X](x) e) [Int] 3, the type of x can be inferred as either X or Int 
since they are actually equivalent. 


Sugar for Type Synonyms. In the same way that we can regard let expressions as 
syntactic sugar, in the new type system we further gain built-in type synonyms 
for free. A type synonym is a new name for an existing type. Type synonyms 
are common in languages such as Haskell. In our calculus a simple form of type 
synonyms can be desugared as follows: 


type a= Aine ~ (Aa. e) A 


One practical benefit of such syntactic sugar is that it enables a direct encod- 
ing of a System F-like language with declarations (including type-synonyms). 
Although declarations are often viewed as a routine extension to a calculus, 
and are not formally studied, they are highly relevant in practice. Therefore, a 
more realistic formalization of a programming language should directly account 
for declarations. By providing a way to encode declarations, our new calculus 
enables a simple way to formalize declarations. 


Type Abstraction. The type equalities introduced by type applications may seem 
like we are breaking System F type abstraction. However, we argue that type 
abstraction is still supported by our System F variant. For example: 


let inc = Aa. Ax : a. x + 1 in inc Inte 


(after desugaring) does not type-check, as in a System-F like language. In 
our type system lambda abstractions that are immediatelly applied to an 
argument, and unapplied lambda abstractions behave differently. Unapplied 
lambda abstractions are just like System F abstractions and retain type 
abstraction. The example above illustrates this. In contrast the typeable 
example (Aa. Ax : a. x + 1) Int, which uses a lambda abstraction directly 
applied to an argument, can be regarded as the desugared expression for 
type a = Int in Ax : a. x + 1. 


3 A Polymorphic Language with Higher-Ranked Types 


This section first presents a declarative, syntax-directed type system for a lambda 
calculus with implicit higher-ranked polymorphism. The interesting aspects 
about the new type system are: (1) the typing rules, which employ a combina- 
tion of inference and application modes; (2) the novel subtyping relation under 
an application context. Later, we prove our type system is type-safe by a type 
directed translation to System F [16,27] in Sect. 3.4. Finally an algorithmic type 
system is discussed in Sect. 3.5. 
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3.1 Syntax 


The syntax of the language is: 


Expr e = g |n | àx: A. e| àz. e | e1 e2 
Type A,B ::=a | A > B | Ya.A | Int 
Monotype T = a | Ti > T2 | Int 

Typing Context I :=ø|T,x:A 

Application Context W% ::= Ø |V, A 


Expressions. Expressions e include variables (x), integers (n), annotated lambda 
abstractions (Ax : A. e), lambda abstractions (Ax. e), and applications (e1 e2). 
Letters x,y,z are used to denote term variables. Notably, the syntax does not 
include a let expression (let x = e1 in e2). Let expressions can be regarded as 
the standard syntax sugar (Ax. e2) e1, as illustrated in more detail later. 


Types. Types include type variables (a), functions (A — B), polymorphic types 
(Va.A) and integers (Int). We use capital letters (A, B) for types, and small let- 
ters (a,b) for type variables. Monotypes are types without universal quantifiers. 


Contexts. Typing contexts I’ are standard: they map a term variable x to its 
type A. We implicitly assume that all the variables in I’ are distinct. The main 
novelty lies in the application contexts W, which are the main data structure 
needed to allow types to flow from arguments to functions. Application contexts 
are modeled as a stack. The stack collects the types of arguments in applications. 
The context is a stack because if a type is pushed last then it will be popped first. 
For example, inferring expression e under application context (a, Int), means e 
is now being applied to two arguments e1, e2, with e1 : Int, e2 : a, so e should be 
of type Int > a > A for some A. 


3.2 Type System 


The top part of Fig.1 gives the typing rules for our language. The judgment 
IiWte = Bis read as: under typing context I’, and application context V, 
e has type B. The standard inference mode I’ e = B can be regarded as a 
special case when the application context is empty. Note that the variable names 
are assumed to be fresh enough when new variables are added into the typing 
context, or when generating new type variables. 

Rule T-VAR says that if x : A is in the typing context, and A is a subtype of 
B under application context W, then x has type B. It depends on the subtyping 
rules that are explained in Sect.3.3. Rule T-INT shows that integer literals are 
only inferred to have type Int under an empty application context. This is obvious 
since an integer cannot accept any arguments. 

T-LAM shows the strength of application contexts. It states that, without 
annotations, if the application context is non-empty, a type can be popped from 
the application context to serve as the type for x. Inference of the body then 
continues with the rest of the application context. This is possible, because the 
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TrivWrFe=B 


cr: Aer WEA <: B 
T-VAR —__ T-IntT 
Mra > B En => Int 
I,c:AiWFe > B llezrre = B 


T-LAM T-LAM2 
ri Y, AF àzr.e > AB rFĀFàzr.e > rt>B 


I,x:AFe > B 
rAz: A.e > A—>B 


T-LAMANN1 


C<: A Ic: AiWre > B a = ftv(A) — ftv(L) 
T-LAMANN2 T-GEN 
MWw,ChiAr: Ae > CoB Tyen(A) = Va.A 


Tre > A Igen(A) = B Trivy,Brka > BAC 
IiWFee-r Cc 


T-APP 


A<: B 


A<: B 
—— S-INT S-VAR ————— S-FORALLR, 
Int <: Int a<: a A <: Va.B 


Ala T] <: B C<: A B<: D 
—_———_ S-FORALLL S-FUN 
VaA <: B A—>—B <: C> D 


WFA<: B 


v,C H Alame T] <: B 
S-EMPTY S-FORALLL2 


ØFA<: A W,CrYaA <: B 


C<: A VFB <: D 
YCFKASB <: C3D 


S-FuN2 


Fig. 1. Syntax-directed typing and subtyping. 


expression Ax. e is being applied to an argument of type A, which is the type at 
the top of the application context stack. Rule T-LAM2 deals with the case when 
the application context is empty. In this situation, a monotype T is guessed for 
the argument, just like the Hindley-Milner system. 

Rule T-LAMANNI1 works as expected with an empty application context: a 
new variable x is put with its type A into the typing context, and inference 
continues on the abstraction body. If the application context is non-empty, then 
the rule T-LAMANN2 applies. It checks that C is a subtype of A before putting 
x: A in the typing context. However, note that it is always possible to remove 
annotations in an abstraction if it has been applied to some arguments. 

Rule T-APP pushes types into the application context. The application rule 
first infers the type of the argument e> with type A. Then the type A is gener- 
alized in the same way that types in let expressions are generalized in the HM 


284 N. Xie and B. C. d. S. Oliveira 


type system. The resulting generalized type is B. The generalization is shown 
in rule T-GEN, where all free type variables are extracted to quantifiers. Thus 
the type of e; is now inferred under an application context extended with type 
B. The generalization step is important to infer higher ranked types: since B 
is a possibly polymorphic type, which is the argument type of e1, then e; is of 
possibly a higher rank type. 


Let Expressions. The language does not have built-in let expressions, but instead 
supports let as syntactic sugar. The typing rule for let expressions in the HM 
system is (without the gray-shaded part): 
Tre => A Igen(A1) = A2 T,x: AgiW Fe > B 
IY Fletr= eine > B 


T-LET 


where we do generalization on the type of e1, which is then assigned as the 
type of x while inferring e2. Adapting this rule to our system with application 
contexts would result in the gray-shaded part, where the application context is 
only used for e2, because ez is the expression being applied. If we desugar the let 
expression (let x = e1 in e2) to ((Ax. e2) e1), we have the following derivation: 


I,x:AoiWFe > B 
rre => Aj Pgen(A1) = A2 TW, A2 F àz. e = A2 — B 
TiWt (Az. e2)a > B 


T-LAM 


T-APP 


The type Ag is now pushed into application context in rule T-APP, and then 
assigned to x in T-LAM. Comparing this with the typing derivations with rule 
T-LET, we now have same preconditions. Thus we can see that the rules in Fig. 1 
are sufficient to express an HM-style polymorphic let construct. 


Meta-Theory. The type system enjoys several interesting properties, especially 
lemmas about application contexts. Before we present those lemmas, we need a 
helper definition of what it means to use arrows on application contexts. 


Definition 1 (W — B). If W = Aj, A2, ..., An, then Y — B means the function 
type An > ... > Ap — Ay > B. 


Such definition is useful to reason about the typing result with application 
contexts. One specific property is that the application context determines the 
form of the typing result. 


Lemma 1 (Y Coincides with Typing Results). [ff1Wte = A, then 
for some A’, we have A = Y — A’. 


Having this lemma, we can always use the judgment iY F e > Y— A 
instead of Fi Wre > A. 

In traditional bi-directional type checking, we often have one subsumption 
rule that transfers between inference and checked mode, which states that if an 
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expression can be inferred to some type, then it can be checked with this type. In 
our system, we regard the normal inference mode l H e => A asa special case, 
when the application context is empty. We can also turn from normal inference 
mode into application mode with an application context. 


Lemma 2 (Subsumption). frr -e => WA, then ri®Fe > WA. 


The relationship between our system and standard Hindley Milner type sys- 
tem can be established through the desugaring of let expressions. Namely, if e is 
typeable in Hindley Milner system, then the desugared expression |e| is typeable 
in our system, with a more general typing result. 


Lemma 3 (Conservative over HM). If r ¥M e = A, then for some B, 
we have I F |e] => B, andB <: A. 


3.3 Subtyping 


We present our subtyping rules at the bottom of Fig. 1. Interestingly, our sub- 
typing has two different forms. 


Subtyping. The first judgment follows Odersky and Läufer [24]. A <: B means 
that A is more polymorphic than B and, equivalently, A is a subtype of B. Rules 
S-INT and S-VAR are trivial. Rule S-FORALLR states A is subtype of Ya.B only 
if A is a subtype of B, with the assumption a is a fresh variable. Rule S-FORALLL 
says Va.A is a subtype of B if we can instantiate it with some 7 and show the 
result is a subtype of B. In rule S-FUN, we see that subtyping is contra-variant 
on the argument type, and covariant on the return type. 


Application Subtyping. The typing rule T-VAR uses the second subtyping judg- 
ment YF A <: B. To motivate this new kind of judgment, consider the expres- 
sion id 1 for example, whose derivation is stuck at T-VAR (here we assume 
id : Vaa > a E T): 


id : Vaa >a EI ??? 
rFĀ1i = Int Igen(Int) = Int Lilnthid > 
Tridl > 


T-VAR 


T-APP 


Here we know that id : Va.a — a and also, from the application context, that 
id is applied to an argument of type Int. Thus we need a mechanism for solving 
the instantiation a = Int and return a supertype Int — Int as the type of id. 
This is precisely what the application subtyping achieves: resolve instantiation 
constraints according to the application context. Notice that unlike existing 
works [14,27], application subtyping provides a way to solve instantiation more 
locally, since it does not mutually depend on typing. 

Back to the rules in Fig. 1, one way to understand the judgment YF A <: B 
from a computational point-of-view is that the type B is a computed output, 
rather than an input. In other words B is determined from W and A. This is 
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unlike the judgment A <: B, where both A and B would be computationally 
interpreted as inputs. Therefore it is not possible to view A <: B as a special 
case of WF A <: B where W is empty. 

There are three rules dealing with application contexts. Rule S-EMPTY is 
for case when the application context is empty. Because it is empty, we have no 
constraints on the type, so we return it back unchanged. Note that this is where 
HM systems (also Peyton Jones et al. [27]) would normally use a rule INST to 
remove top-level quantifiers: 


Va.A <: Alar 7] i 

Our system does not need INST, because in applications, type information flows 
from arguments to the function, instead of function to arguments. In the latter 
case, INST is needed because a function type is wanted instead of a polymorphic 
type. In our approach, instantiation of type variables is avoided unless necessary. 

The two remaining rules apply when the application context is non-empty, 
for polymorphic and function types respectively. Note that we only need to 
deal with these two cases because Int or type variables a cannot have a non- 
empty application context. In rule S-FORALL2, we instantiate the polymorphic 
type with some 7, and continue. This instantiation is forced by the application 
context. In rule S-FUN2, one function of type A — B is now being applied to an 
argument of type C. So we check C <: A. Then we continue with B and the 
rest application context, and return C — D as the result type of the function. 


Meta-Theory. Application subtyping is novel in our system, and it enjoys some 
interesting properties. For example, similarly to typing, the application context 
decides the form of the supertype. 


Lemma 4 (WY Coincides with Subtyping Results). [fWt A <: B, then 
for some B', B=W— B'. 


Therefore we can always use the judgment Y F A <: W — B’, instead of Y H 
A <: B. Application subtyping is also reflexive and transitive. Interestingly, 
in those lemmas, if we remove all applications contexts, they are exactly the 
reflexivity and transitivity of traditional subtyping. 


Lemma 5 (Reflexivity). Y-/ WA <: Y — A. 


Lemma 6 (Transitivity). If% F A <: % > B, and h2 F B <: h >C, 
then W”, Pı FA <: Wom CO. 


Finally, we can convert between subtyping and application subtyping. We 
can remove the application context and still get a subtyping relation: 


Lemma 7 (WF <: to <:). [WEA <: B, then A <: B. 


Transferring from subtyping to application subtyping will result in a more 
general type. 
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Lemma 8 (<: to Y F <:). If A <: © — Bı, then for some Bo, we have 
WEA <: © Bo, and Ba <: By. 


This lemma may not seem intuitive at first glance. Consider a concrete exam- 
ple Int — Va.a <: Int — Int, and Int + Int — Va.a <: Int — Va.a. The former 
one, holds because we have Va.a <: Int in the return type. But in the latter one, 
after Int is consumed from application context, we eventually reach S-EMPTY, 
which always returns the original type back. 


3.4 Translation to System F, Coherence and Type-Safety 


We translate the source language into a variant of System F that is also used in 
Peyton Jones et al. [27]. The translation is shown to be coherent and type safe. 
Due to space limitations, we only summarize the key aspects of the translation. 
Full details can be found in the supplementary materials of the paper. 

The syntax of our target language is as follows: 


Expressions s, f ::= x | n | Ax : A. s | Aa.s | sı s2 | sı A 


In the translation, we use f to refer to the coercion function produced by 
the subtyping translation, and s to refer to the translated term in System F. We 
write I’ HF s: A to mean the term s has type A in System F. 

The type-directed translation follows the rules in Fig. 1, with a translation 
output in the forms of judgments. We summarize all judgments as: 


Judgment Translation Output Soundness 
A<: Bwf coercion function f Bt’ f:AoB 
WEA <: Bwf coercion function f ØH fF: A+B 
rike > Aws target expression s Pi ee A 


For example, A <: B ~~ f means that if A <: B holds in the source language, 
we can translate it into a System F term f, which is a coercion function and 
has type A — B. We prove that our system is type safe by proving that the 
translation produces well-typed terms. 


Lemma 9 (Typing Soundness). Jf i Ybe > A ~s, then THF 5: A. 


However, there could be multiple targets corresponding to one expression due 
to the multiple choices for r. To prove that the translation is coherent, we prove 
that all the translations for one expression have the same operational semantics. 
We write |e] for the expressions after type erasure since types are useless after 
type checking. Because multiple targets could have different number of coer- 
cion functions, we use 7-id equality [5] instead of syntactic equality, where two 
expressions are regarded as equivalent if they can turn into the same expression 
through 7-reduction or removal of redundant identity functions. We then prove 
that our translation actually generates a unique target: 


Lemma 10 (Coherence). rh 1% Fes A vs, adhi F 
e > B ~s, then |s1| ~nia |s2l. 
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3.5 Algorithmic System 


Even though our specification is syntax-directed, it does not directly lead to an 
algorithm, because there are still many guesses in the system, such as in rule 
T-LAM2. This subsection presents a brief introduction of the algorithm, which 
essentially follows the approach by Peyton Jones et al. [27]. Full details can be 
found in the supplementary materials. 2 

Instead of guessing, the algorithm creates meta type variables @, 3 which are 
waiting to be solved. The judgment for the algorithmic type system is (So, No) | 
rið Fe > A (S1, N). Here we use N as name supply, from which we can 
always extract new names. We use S as a notation for the substitution that maps 
meta type variables to their solutions. For example, rule T-LAM2 becomes 


(So, No) iT.z:Bke => Am (Sı, Nı) 


N ~ AT-LAM1 
(So, Nob) PF Ate > BoA (S1, M1) 


Comparing it to rule T-LAM2, 7 is replaced by a new meta type variable B 
from name supply Nop. But despite of the name supply and substitution, the 
rule retains the structure of T-LAM2. 

Having the name supply and substitutions, the algorithmic system is a direct 
extension of the specification in Fig. 1, with a process to do unifications that solve 
meta type variables. Such unification process is quite standard and similar to 
the one used in the Hindley-Milner system. We proved our algorithm is sound 
and complete with respect to the specification. 


Theorem 1 (Soundness). If ([], No) r Fe = A (S1, N1), then for any 
substitution V with dom(V) = fmv (SiT, S14), we have VS Fe > VSA. 


Theorem 2 (Completeness). If [ F e = A, then for a fresh No, we 
have (||; Nd) 1 T F e > B — (Si, Nı), and for some Sz, we have 
I'($2S,B) <: (A). 


4 More Expressive Type Applications 


This section presents a System-F-like calculus, which shows that the application 
mode not only does work well for calculi with explicit type applications, but it 
also adds interesting expressive power, while at the same time retaining unique- 
ness of types for explicitly polymorphic functions. One additional novelty in this 
section is to present another possible variant of typing and subtyping rules for 
the application mode, by exploiting the lemmas presented in Sects. 3.2 and 3.3. 
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(PA 


MA =A (T, x: B} 
a=B ([)(Ala + B]) 


A= 
(r,a)A = (DA (I,a = B)A = 
Fig. 2. Apply contexts as substitutions on types. 


acr Ita rH B Tah A 
WF-INT ——— ~ WF --ARROW — WF-ALL 
Tra Ie Int rFFA-—B T F Ya.A 


Fig. 3. Well-formedness. 


4.1 Syntax 
We focus on a new variant of the standard System F. The syntax is as follows: 


Expr e= |n |Ar: A. e| Ax. e| e1 e2 | Aa.e | e [A] 
Type A ::=a|Int| A> B | Ya.A 

Typing Context r :=Øø|T,x:A|T,a|T,a= A 

Application Context VW ::= Ø | W, A | W, [A] 


The syntax is mostly standard. Expressions include variables x, integers n, 
annotated abstractions Ax : A. s, unannotated abstractions Ax. e, applications 
e1 e2, type abstractions Aa.s, and type applications e; [A]. Types includes type 
variable a, integers Int, function types A — B, and polymorphic types Va.A. 

The main novelties are in the typing and application contexts. Typing con- 
texts contain the usual term variable typing x : A, type variables a, and type 
equations a = A, which track equalities and are not available in System F. Appli- 
cation contexts use A for the argument type for term-level applications, and use 
[A] for the type argument itself for type applications. 

Applying Contexts. The typing contexts contain type equations, which can be 
used as substitutions. For example, a = Int, æ : Int,b = Bool can be applied to 
a — b to get the function type Int > Bool. We write (I’)A for I applied as a 
substitution to type A. The formal definition is given in Fig. 2. 
Well-Formedness. The type well-formedness under typing contexts is given in 
Fig. 3, which is quite straightforward. Notice that there is no rule corresponding 
to type variables in type equations. For example, a is not a well-formed type 
under typing context a = Int, instead, (a = Int)a is. In other words, we keep the 
invariant: types are always fully substituted under the typing context. 

The well-formedness of typing contexts I’ ctz, and the well-formedness of 
application contexts I” F W can be defined naturally based on the well-formedness 
of types. The specific definitions can be found in the supplementary materials. 
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TriWrFe=B 


I ctx ry xv: Aer WEA <: B I ctx 
SF-VAR —————— SF - Int 
Fiy = B En => Int 
I,a:(P)AFe > B 
SF-LAMANNIL 
Phra: Ae => (IAB 
I,ce:(P)AiWFe > B Ic: AiWFe > B 
SF-LAMANN2 SF-LAM 
Pit, (TAF Ar: Ae > B rið, AF z.e > B 
Tre > A rið, AFeı > B Tares B 
SF-APP SF-TLAM1 
rike e > B Ir F a.e > Va.B 

I,a=AiWFe s B riv, K{D)A]F Fe > B 

SF-TLAM2 SF-TAPP 
rı Y, [A] F Aae > B rivte[A] > B 

WEA <: B 
—— ~ SF-SEMPTY 
@FA<: A 
Wt Blaw A] <: C VFB<:C 
SF-STApp SF-SApp 
W [A] F Ya.B <: C WAFASB<: C 


Fig. 4. Type system for the new System F variant. 


4.2 Type System 


Typing Judgments. From Lemmas 1 and 4, we know that the application context 
always coincides with typing/subtyping results. This means that the types of the 
arguments can be recovered from the application context. So instead of the whole 
type, we can use only the return type as the output type. For example, we review 
the rule T-LAM in Fig. 1: 


I,cx:AiWFe > B T xz:Ai Fe > C 
T-LAM T-Lam-ALT 
IiW,ArFAr.e > AB PiWAFAr.e > C 


We have B = W — C for some C by Lemma 1. Instead of B, we can directly 
return C as the output type, since we can derive from the application context 
that e is of type ¥ — C, and Xx. e is of type (W, A) — C. Thus we obtain the 
T-LAM-ALT rule. 

Note that the choice of the style of the rules is only a matter of taste in the 
language in Sect.3. However, it turns out to be very useful for our variant of 
System F, since it helps avoiding introducing types like Va = Int.a. Therefore, 
we adopt the new form of judgment. Now the judgment Ti% e = A is 
interpreted as: under the typing context I, and the application context W, the 
return type of e applied to the arguments whose types are in W is A. 
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Typing Rules. Using the new interpretation of the typing judgment, we give the 
typing rules in the top of Fig. 4. SF-VAR depends on the subtyping rules. Rule 
SF-INT always infers integer types. Rule SF-LAMANN 1 first applies current con- 
text on A, then puts x : (IYA into the typing context to infer e. The return type 
is a function type because the application context is empty. Rule SF-LAMANN2 
has a non-empty application context, so it requests that the type at the top of 
the application context is equivalent to (I’)A. The output type is B instead of 
a function type. Notice how the invariant that types are fully substituted under 
the typing context is preserved in these two rules. 

Rule SF-LAM pops the type A from the application context, puts x: A into 
the typing context, and returns only the return type B. In rule SF-App, the 
argument type A is pushed into the application context for inferring e1, so the 
output type B is the type of e under application context (W, A), which is exactly 
the return type of e1 e2 under W. 

Rule SF-TLAM1 is for type abstractions. The type variable a is pushed 
into the typing context, and the return type is a polymorphic type. In rule SF- 
TLAM2, the application context has the type argument A at its top, which means 
the type abstraction is applied to A. We then put the type equation a = A into 
the typing context to infer e. Like term-level applications, here we only return 
the type B instead of a polymorphic type. In rule SF-TAPP, we first apply the 
typing context on the type argument A, then we put the applied type argument 
([’)A into the application context to infer e, and return B as the output type. 


Subtyping. The definition of subtyping is given at the bottom of Fig. 4. As with 
the typing rules, the part of argument types corresponding to the application 
context is omitted in the output. We interpret the rule form Wt A <: B as, 
under the application context W, A is a subtype of the type whose type arguments 
are W and the return type is B. 

Rule SF-SEMPTy returns the input type under the empty application con- 
text. Rule SF-STAPP instantiates a with the type argument A, and returns C. 
Note how application subtyping can be extended naturally to deal with type 
applications. Rule SF-SAPP requests that the argument type is the same as the 
top type in the application context, and returns C. 


4.3 Meta Theory 


Applying the idea of the application mode to System F results in a well-behaved 
type system. For example, subtyping transitivity becomes more concise: 


Lemma 11 (Subtyping transitivity). If% F A <: B, andWtB <: C, 
then W, Pı FA <: C. 


Also, we still have the interesting subsumption lemma that transfers from 
the inference mode to the application mode: 


Lemma 12 (Subsumption). fr be > A, and F Y, andWEA <: B, 
then iY Fe > B. 
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Furthermore, we prove the type safety by proving the progress lemma and 
the preservation lemma. The detailed definitions of operational semantics and 
values can be found in the supplementary materials. 


Lemma 13 (Progress). [fat e = T, then either e is a value, or there exists 
1 


e', such that e — e’. 
Lemma 14 (Preservation). fri ¥ F e => A, ande — e', then iY H 
e => A. 


Moreover, introducing type equality preserves unique types: 


Lemma 15 (Uniqueness of typing). [fl 1 Wt e => A, anndrıv kF 
e > B, then A=B. 


5 Discussion 


This section discusses possible design choices regarding bi-directional type check- 
ing with the application mode, and talks about possible future work. 


5.1 Combining Application and Checked Modes 


Although the application mode provides us with alternative design choices in 
a bi-directional type system, a checked mode can still be easily added. One 
motivation for the checked mode would be annotated expressions e : A, where 
the type of expressions is known and is therefore used to check expressions. 

Consider adding e : A for introducing the third checked mode for the language 
in Sect. 3. Notice that, since the checked mode is stronger than application mode, 
when entering checked mode the application context is no longer useful. Instead 
we use application subtyping to satisfy the application context requirements. 
A possible typing rule for annotation expressions is: 


WFA <: B rFe < A 


-ANT 
Iivt(e:A) > B va 


Here, e is checked using its annotation A, and then we instantiate A to B using 
subtyping with application context WV. 

Now we can have a rule set of the checked mode for all expressions. For 
example, one useful rule for abstractions in checked mode could be ABS-CHK, 
where the parameter type A serves as the type of x, and typing checks the 
body with B. Also, combined with the information flow, the checked rule for 
application checks the function with the full type. 


I,x:ArFe = B rFe > A Tre < A—>—B 
ABS-CHK APP-CHK 


rH àz.e = A>B Free < B 
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Note that adding expression annotations might bring convenience for pro- 
grammers, since annotations can be more freely placed in a program. For exam- 
ple, (Af. f 1) : (Int — Int) —>Int becomes valid. However this does not add 
expressive power, since programs that are typeable under expression annotations, 
would remain typeable after moving the annotations to bindings. For example 
the previous program is equivalent to (Af : (Int — Int). f 1). 

This discussion is a sketch. We have not defined the corresponding declarative 
system nor algorithm. However we believe that the addition of a checked mode 
will not bring surprises to the meta-theory. 


5.2 Additional Constructs 


In this section, we show that the application mode is compatible with other 
constructs, by discussing how to add support for pairs in the language given in 
Sect. 3. A similar methodology would apply to other constructs like sum types, 
data types, if-then-else expressions and so on. 

The introduction rule for pairs must be in the inference mode with an empty 
application context. Also, the subtyping rule for pairs is as expected. 


Fre = A Tre => B A <: By Ag <: Bə 
T-PAIR S-PAIR 
I F (e1, e2) = (A, B) (Ay, Ag) z (Bı, B2) 


The application mode can apply to the elimination constructs of pairs. If one 
component of the pair is a function, for example, (fst (Ax. 7,3) 4), then it is 
possible to have a judgment with a non-empty application context. Therefore, 
we can use the application subtyping to account for the application contexts: 


Tre = (A,B) WFA<: C Tre = (A,B) WFB<: C 
T-Fstl T-SND1 
Iriwtfste > C TriWtsnde > C 


However, in polymorphic type systems, we need to take the subsumption rule 
into consideration. For example, in the expression (Ax : (Va.(a,b)). fst x), fst 
is applied to a polymorphic type. Interestingly, instead of a non-deterministic 
subsumption rule, having polymorphic types actually leads to a simpler solution. 
According to the philosophy of the application mode, the types of the arguments 
always flow into the functions. Therefore, instead of regarding (fst e) as an 
expression form, where e is itself an argument, we could regard fst as a function 
on its own, whose type is (Vab.(a,b) — a). Then as in the variable case, we use 
the subtyping rule to deal with application contexts. Thus the typing rules for 
fst and snd can be modeled as: 

WF (Vab.(a,b) >a) <: A WE (Vab.(a,b) > b) <: A 


T-Fst2 T-SND2 
Iiv+ fst > A iW snd > A 


Note that another way to model those two rules would be to simply have an 
initial typing environment TD initiat = fst : (Vab.(a, b) > a), snd : (Vab.(a, b) — b). 
In this case the elimination of pairs be dealt directly by the rule for variables. 
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An extended version of the calculus presented in Sect. 3, which includes the 
rules for pairs (T-PAIR, S-Pair, T-FstT2 and T-SND2), has been formally stud- 
ied. All the theorems presented in Sect. 3 hold with the extension of pairs. 


5.3 Dependent Type Systems 


One remark about the application mode is that the same idea is possibly appli- 
cable to systems with advanced features, where type inference is sophisticated 
or even undecidable. One promising application is, for instance, dependent type 
systems [2,3,10,21,37]. Type systems with dependent types usually unify the 
syntax for terms and types, with a single lambda abstraction generalizing both 
type and lambda abstractions. Unfortunately, this means that the let desugar 
is not valid in those systems. As a concrete example, consider desugaring the 
expression let a = Intin Az : a. x + 1 into (Aa. Ax : a. x + 1) Int, which is ill- 
typed because the type of x in the abstraction body is a and not Int. 

Because let cannot be encoded, declarations cannot be encoded either. Mod- 
eling declarations in dependently typed languages is a subtle matter, and nor- 
mally requires some additional complexity [34]. 

We believe that the same technique presented in Sect. 4 can be adapted into 
a dependently typed language to enable a let encoding. In a dependent type 
system with unified syntax for terms and types, we can combine the two forms 
in the typing context (x : A and a = A) into a unified form x = e : A. Then 
we can combine two application rules SF-APP and SF-TAPP into DE-APP, and 
also two abstraction rules SF-LAM and SF-TLAM1 into DE-LAM. 


Tre > A ri, e2:AFe > B I,c=e1.:AiWFe > B 
DE-APP DE-LAM 
rike e > B ri, e: AF àz.e > B 


With such rules it would be possible to handle declarations easily in depen- 
dent type systems. Note this is still a rough idea and we have not fully worked 
out the typing rules for this type system yet. 


6 Related Work 


6.1 Bi-directional Type Checking 


Bi-directional type checking was popularized by the work of Pierce and Turner 
[29]. It has since been applied to many type systems with advanced features. The 
alternative application mode introduced by us enables a variant of bi-directional 
type checking. There are many other efforts to refine bi-directional type checking. 

Colored local type inference [25] refines local type inference for explicit 
polymorphism by propagating partial type information. Their work is built on 
distinguishing inherited types (known from the context) and synthesized types 
(inferred from terms). A similar distinction is achieved in our algorithm by 
manipulating type variables [14]. Also, their information flow is from functions 
to arguments, which is fundamentally different from the application mode. 
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The system of tridirectional type checking [15] is based on bi-directional type 
checking and has a rich set of property types including intersections, unions and 
quantified dependent types, but without parametric polymorphism. Tridirec- 
tional type checking has a new direction for supporting type checking unions 
and existential quantification. Their third mode is basically unrelated to our 
application mode, which propagates information from outer applications. 

Greedy bi-directional polymorphism [13] adopts a greedy idea from Cardelli 
[4] on bi-directional type checking with higher ranked types, where the type 
variables in instantiations are determined by the first constraint. In this way, 
they support some uses of impredicative polymorphism. However, the greediness 
also makes many obvious programs rejected. 


6.2 Type Inference for Higher-Ranked Types 


As a reference, Fig. 5 [14,20] gives a high-level comparison between related works 
and our system. 


Predicative Systems. Peyton Jones et al. [27] developed an approach for type 
inference for higher rank types using traditional bi-directional type checking 
based on Odersky and Laufer [24]. However in their system, in order to do 
instantiation on higher rank types, they are forced to have an additional type 
category (p types) as a special kind of higher rank type without top-level quan- 
tifiers. This complicates their system since they need to have additional rule sets 
for such types. They also combine a variant of the containment relation from 
Mitchell [23] for deep skolemisation in subsumption rules, which we believe is 
compatible with our subtyping definition. 

Dunfield and Krishnaswami [14] build a simple and concise algorithm for 
higher ranked polymorphism based on traditional bidirectional type checking. 
They deal with the same language of Peyton Jones et al. [27], except they do 
not have let expressions nor generalization (though it is discussed in design 
variations). They have a special application judgment which delays instantiation 
until the expression is applied to some argument. As with application mode, this 
avoids the additional category of types. Unlike their work, our work supports 
generalization and HM-style let expressions. Moreover the use of an application 
mode in our work introduces several differences as to when and where annota- 
tions are needed (see Sect. 2.4 for related discussion). 


Impredicative Systems. ML [18,19,32] generalizes ML with first-class polymor- 
phism. ML” introduces a new type of bounded quantification (either rigid or flexi- 
ble) for polymorphic types so that instantiation of polymorphic bindings is delayed 
until a principal type is found. The HML system [20] is proposed as a simplifica- 
tion and restriction of ML" . HML only uses flexible types, which simplifies the type 
inference algorithm, but retains many interesting properties and features. 

The FPH system [35] introduces boxy monotypes into System F types. One 
critique of boxy type inference is that the impredicativity is deeply hidden in the 
algorithmic type inference rules, which makes it hard to understand the interac- 
tion between its predicative constraints and impredicative instantiations [31]. 
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System Types Impred Let Annotations 

ML" flexible and rigid yes yes on polymorphically used parameters 

HML flexible F-types yes yes on polymorphic parameters 

FPH boxy F-types yes yes on polymorphic parameters and some 
let bindings with higher-ranked types 

Peyton Jones|F-types no yes on polymorphic parameters 

et al. (2007) 

Dunfield et al. | F-types no no on polymorphic parameters 

(2013) 

this paper F-types no sugar on polymorphic parameters that are 
not applied 


Fig. 5. Comparison of higher-ranked type inference systems. 


6.3 Tracking Type Equalities 


Tracking type equalities is useful in various situations. Here we discuss specifi- 
cally two related cases where tracking equalities plays an important role. 


Type Equalities in Type Checking. Tracking type equalities is one essential 
part for type checking algorithms involving Generalized Algebraic Data Types 
(GADTs) [6, 26,33]. For example, Peyton Jones et al. [26] propose a type infer- 
ence algorithm based on unification for GADTs, where type equalities only apply 
to user-specified types. However, reasoning about type equalities in GADTs is 
essentially different from the approach in Sect. 4: type equalities are introduced 
by pattern matches in GADTs, while they are introduced through type appli- 
cations in our system. Also, type equalities in GADTs are local, in the sense 
different branches in pattern matches have different type equalities for the same 
type variable. In our system, a type equality is introduced globally and is never 
changed. However, we believe that they can be made compatible by distinguish- 
ing different kinds of equalities. 


Equalities in Declarations. In systems supporting dependent types, type equal- 
ities can be introduced by declarations. In the variant of pure type systems 
proposed by Severi and Poll [34], expressions x = a: A in b generate an equality 
x =a: Ain the typing context, which can be fetched later through d-reduction. 
However, 6-reduction rules require careful design, and the conversion rule of 
6-reduction makes the type system non-deterministic. One potential usage of 
the application mode is to help reduce the complexity for introducing declara- 
tions in those type systems, as briefly discussed in Sect. 5.3. 


7 Conclusion 


We proposed a variant of bi-directional type checking with a new application 
mode, where type information flows from arguments to functions in applications. 
The application mode is essentially a generalization of the inference mode, can 
therefore work naturally with inference mode, and avoid the rule duplication 


Let Arguments Go First 297 


that is often needed in traditional bi-directional type checking. The application 
mode can also be combined with the checked mode, but this often does not 
add expressiveness. Compared to traditional bi-directional type checking, the 
application mode opens a new path to the design of type inference/checking. 

We have adopted the application mode in two type systems. Those two 
systems enjoy many interesting properties and features. However as bi- 
directional type checking can be applied to many type systems, we believe appli- 
cation mode is applicable to various type systems. One obvious potential future 
work is to investigate more systems where the application mode brings benefits. 
This includes systems with subtyping, intersection types [8,30], static overload- 
ing, or dependent types. 
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Abstract. The paper investigates behavioural equivalence between pro- 
grams in a call-by-value functional language extended with a signature 
of (algebraic) effect-triggering operations. Two programs are considered 
as being behaviourally equivalent if they enjoy the same behavioural 
properties. To formulate this, we define a logic whose formulas specify 
behavioural properties. A crucial ingredient is a collection of modalities 
expressing effect-specific aspects of behaviour. We give a general theory 
of such modalities. If two conditions, openness and decomposability, are 
satisfied by the modalities then the logically specified behavioural equiva- 
lence coincides with a modality-defined notion of applicative bisimilarity, 
which can be proven to be a congruence by a generalisation of Howe’s 
method. We show that the openness and decomposability conditions hold 
for several examples of algebraic effects: nondeterminism, probabilistic 
choice, global store and input/output. 


1 Introduction 


The notion of behavioural equivalence between programs is a fundamental con- 
cept in the theory of programming languages. A conceptually natural approach 
to defining behavioural equivalence is to consider two programs as being equiv- 
alent if they enjoy the same ‘behavioural properties’. This can be made precise 
by specifying a behavioural logic whose formulas express behavioural properties. 
Two programs M,N are then defined to be equivalent if, for all formulas ®, it 
holds that M = @iff N | @® (where M H @ expresses the satisfaction 
relation: program M enjoys property ®). 

This logical approach to defining behavioural equivalence has been particu- 
larly prominent in concurrency theory, where the classic result is that the equiv- 
alence defined by Hennessy-Milner logic [4] coincides with bisimilarity [14,17]. 
The aim of the present paper is to adapt the logical approach to the very different 
computational paradigm of applicative programming with effects. 
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More precisely, we consider a call-by-value functional programming language 
with algebraic effects in the sense of Plotkin and Power [21]. Broadly speaking, 
effects are those aspects of computation that involve a program interacting with 
its ‘environment’; for example: nondeterminism, probabilistic choice (in both 
cases, the choice is deferred to the environment); input/output; mutable store 
(the machine state is modified); control operations such as exceptions, jumps and 
handlers (which interact with the continuation in the evaluation process); etc. 
Such general effects collectively enjoy common properties identified in the work 
of Moggi on monads [15]. Among them, algebraic effects play a special role. 
They can be included in a programming language by adding effect-triggering 
operations, whose ‘algebraic’ nature means that effects act independently of 
the continuation. From the aforementioned examples of effects, only jumps and 
handlers are non-algebraic. Thus the notion of algebraic effect covers a broad 
range of effectful computational behaviour. Call-by-value functional languages 
provide a natural context for exploring effectful programming. From a theoretical 
viewpoint, other programming paradigms are subsumed; for example, imperative 
programs can be recast as effectful functional ones. From a practical viewpoint, 
the combination of effects with call-by-value leads to the natural programming 
style supported by impure functional languages such as OCaml. 

In order to focus on the main contributions of the paper (the behavioural logic 
and its induced behavioural equivalence), we instantiate “call-by-value functional 
language with algebraic effects” using a very simple language. Our language is a 
simply-typed A-calculus with a base type of natural numbers, general recursion, 
call-by-value function evaluation, and algebraic effects, similar to [21]; although, 
for technical convenience, we adopt the (equivalent) formulation of fine-grained 
call-by-value [13]. The language is defined precisely in Sect. 2. Following [8,21], 
an operational semantics is given that evaluates programs to effect trees. 

Section 3 introduces the behavioural logic. In our impure functional setting, 
the evaluation of a program of type 7 results in a computational process that 
may or may not invoke effects, and which may or may not terminate with a 
return value of type T. The key ingredient in our logic is an effect-specific family 
O of modalities, where each modality o € O converts a property ¢ of values of 
type T to a property o@ of general programs (called computations) of type T. 
The idea is that such modalities capture all relevant effect-specific behavioural 
properties of the effects under consideration. 

A main contribution of the paper is to give a general framework for defin- 
ing such effect modalities, applicable across a wide range of algebraic effects. 
The general setting is that we have a signature X of effect operations, which 
determines the programming language, and a collection O of modalities, which 
determines the behavioural logic. In order to specify the semantics of the logic, we 
require each modality to be assigned a set of unit-type effect trees, which deter- 
mines the meaning of the modality. Several concrete examples and a detailed 
general explanation are given in Sect. 3. 

In Sect. 4, we consider the relation of behavioural equivalence between pro- 
grams determined by the logic. A fundamental well-behavedness property is that 
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any reasonable program equivalence should be a congruence with respect to the 
syntactic constructs of the programming language. Our main theorem (The- 
orem 1) is that, under two conditions on the collection O of modalities, which 
hold for all the examples of effects we consider, the logically induced behavioural 
equivalence is indeed a congruence. 

In order to prove Theorem1, we develop an alternative perspective on 
behavioural equivalence, which is of interest in its own right. In Sect.5 we 
show how the modalities O determine a relation of applicative O-bisimilarity, 
which is an effect-sensitive version of Abramsky’s notion of applicative bisim- 
ilarity [1]. Theorem 2 shows that applicative O-bisimilarity coincides with the 
logically defined relation of behavioural equivalence. 

The proof of Theorem 1 is then concluded in Sect.6, where we use Howe’s 
method [5,6] to show that applicative O-bisimilarity is a congruence. Although 
the proof is technically involved, we give only a brief outline, as the details closely 
follow the recent paper [9], in which Howe’s method is applied to an untyped 
language with general algebraic effects. 

In Sect. 7, we present a variation on our behavioural logic, in which we make 
the syntax of logical formulas independent of the syntax of the programming 
language. 

Finally, in Sect. 8 we discuss related and further work. 


2 A Simple Programming Language 


As motivated in the introduction, our chosen base language is a simply-typed 
call-by-value functional language with general recursion and a ground type of 
natural numbers, to which we add (algebraic) effect-triggering operations. This 
means that our language is a call-by-value variant of PCF [20], extended with 
algebraic effects, resulting in a language similar to the one considered in [21]. In 
order to simplify the technical treatment of the language, we present it in the 
style of fine-grained call-by-value [13]. This means that we make a syntactic dis- 
tinction between values and computations, representing the static and dynamic 
aspects of the language respectively. Furthermore, all sequencing of computa- 
tions is performed using a single language construct, the let construct. The 
resulting language is straightforwardly intertranslatable with the more tradi- 
tional call-by-value formulation. But the encapsulation of all sequencing within 
a single construct has the benefit of avoiding redundancy in proofs. 

Our types are just the simple types obtained by iterating the function type 
construction over two base types: N of natural numbers, and also a unit type 1. 


Types: 7,p:= 1|/N|por 
Contexts: I ::= Ø| IT, a:7 


As usual, term variables x are taken from a countably-infinite stock of such 
variables, and the context I’, x : 7 can only be formed if the variable x does not 
already appear in I’. 

As discussed above, program terms are separated into two mutually defined 
but disjoint categories: values and computations. 
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Values: V,W ::= «| Z | S(V) | àz.M | x 
Computations: M,N := VW | return V | let M = cin N | fix (V) | 
case V in {Z > M,S(x) > N} 


Here, * is the unique value of the unit type. The values of the type of natural 
numbers are the numerals represented using zero Z and successor S. The values 
of function type are the A-abstractions. And a variable x can be considered a 
value, because, under the call-by-value evaluation strategy of the language, it 
can only be instantiated with a value. 

The computations are: function application VW; the computation that does 
nothing but return a value V; a let construct for sequencing; a fix construct for 
recursive definition; and a case construct that branches according to whether its 
natural-number argument is zero or positive. The computation let M => xin N 
implements sequencing in the following sense. First the computation M is eval- 
uated. Only in the case that the evaluation of M terminates, with return value 
V, does the thread of execution continue to N. In this case, the computation 
N[V /a] is evaluated, and its return value (if any) is the one returned by the let 
construct. 

To the pure functional language described above, we add effect operations. 
The collection of effect operations is specified by a set X (the signature) of such 
operations, together with, for each o € X an associated arity which takes one of 
the four forms below 


asa Nxa">-a aNoa Nxa >a. 
The notation here is chosen to be suggestive of the way in which such arities are 
used in the typing rules below, viewing a as a type variable. Each of the forms 
of arity has an associated term constructor, for building additional computation 
terms, with which we extend the above grammar for computation terms. 


Effects: a(Mo, M,.. .;Mn-1) | a(V; Mo, Mi,. se Mina) | a(V) | a(W; V) 


Motivating examples of effect operations and their computation terms can be 
found in Examples 0-5 below. 

The typing rules for the language are given in Fig.1 below. Note that the 
choice of typing rule for an effect operation o € X depends on its declared arity. 

The terms of type 7 are the values and computations generated by the con- 
structors above. Every term has a unique aspect as either a value or computation. 
We write Val(r) and Com(r) respectively for closed values and computations. 
So the closed terms of T are Term(T) = Val(r) U Com(r). For n € N a natural 
number, we write 7 for the numeral S”(Z), hence Val(N) := {7|n € N}. 

We now consider some standard signatures of computationally interesting 
effect operations, which will be used as running examples throughout the paper. 
(We use the same examples as in [8].) 


Example 0 (Pure functional computation). This is the trivial case (from an effect 
point of view) in which the signature of effect operations is empty. The result- 
ing language is a call-by-value variant of PCF [20]. 
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TrV:N 
Lginrresr Trx«:1 TKZ:N PES(V):N 
PREV ee Deit Mp 
I F return(V):7 [TF (àz:Tt.M):T—> p 
TEV:t3p DTE EW:tr TEV:(T>p0)>(T>p) 


r-(VW):p [} fix(V):7 > p 
TEV:N PEM:7 Ta: NEN: +t FRM: T x:r F N:p 
r- case V of {Z > M; S(x) > N}:7 Thiet M>zinaN:p 
o:a” >a FEMT a:aN 5a TEV:No7 
TF o(Mo,M1,...,Mn-1): T Tho(V):r 


o:Nxa">a TEV:N TEM: T 
DF o(V; Mo, M1,...,Mn—1):7 
g:NxaN sa TErV:N TEW:Nor 
Db o(V;W):7 


Fig. 1. Typing rules 


Example 1 (Error). We take a set of error labels E. For each e € E there is 
an effect operator raise, : a® — a which, when invoked by the computation 


raisee(), aborts evaluation and outputs e as an error message. 


Example 2 (Nondeterminism). There is a binary choice operator or : a? —> a 


which gives two options for continuing the computation. The choice of continu- 
ation is under the control of some external agent, which one may wish to model 
as being cooperative (angelic), antagonistic (demonic), or neutral. 


Example 3 (Probabilistic choice). Again there is a single binary choice operator 
p-or : a? — a which gives two options for continuing the computation. In this 
case, the choice of continuation is probabilistic, with a 4 probability of either 
option being chosen. Other weighted probabilistic choices can be programmed 


in terms of this fair choice operation. 


Example 4 (Global store). We take a set of locations L for storing natural num- 
bers. For each | € L we have lookup, : aN — a and update, : N x a > a. The 
computation lookup,(V) looks up the number at location | and passes it as an 
argument to the function V, and update (n; M) stores n at l and then continues 


with the computation M. 


Example 5 (Input/output). Here we have two operators, read: aN — a which 
reads a number from an input channel and passes it as the argument to a func- 
tion, and write: N x a — a which outputs a number (the first argument) and 
then continues as the computation given as the second argument. 


We next present an operational semantics for our language, under which a 
computation term evaluates to an effect tree: essentially, a coinductively gener- 
ated term using operations from X, and with values and L (nontermination) as 
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the generators. This idea appears in [8,21], and our technical treatment follows 
approach of the latter, adapted to call-by-value. 

We define a single-step reduction relation — between configurations (S, M) 
consisting of a stack S and a computation M. The computation M is the term 
under current evaluation. The stack S represents a continuation computation 
awaiting the termination of M. First, we define a stack-independent reduction 
relation on computation terms that do not involve let at the top level. 


(Ac: 7.M)V ~ M[V/x] 

case Z of {Z > Mi; S(x) > M2} ~ Mı 

case S(V) of {Z > Mı; S(x) > M2} ~ M2[V/z] 

fix(F) ~ return Ar:r. let F(Ay:7let fix F >z in zy) >w in wr 


The behaviour of let is implemented using a system of stacks where: 
Stacks S ::= id | So(let (—)>2 in M) 


We write S{N} for the computation term obtained by ‘applying’ the stack S to 
N, defined by: 


id{N} = N 
(So (let (-) >a in M)){N} = S{let N=>r in M} 

We write Stack(r, p) for the set of stacks S such that for any N € Com(r), it 
holds that S{N} is a well-typed expression of type p. We define a reduction 
relation on pairs Stack(7, p) x Com(r) (denoted (S1, M1) — (S2, M2)) by: 


(S, let N=>a in M) — (So(let (—)>2 in M),N) 
(S,R) = (S,R’) if R~ R' 
(So(let (—) = zx in M),return V) — (S, M[V/z]) 


We define the notion of effect tree for an arbitrary set X, where X is thought 
of as a set of abstract ‘values’. 


Definition 1. An effect tree (henceforth tree), over a set X, determined by a 
signature X of effect operations, is a labelled and possibly infinite tree whose 
nodes have the possible forms. 


. A leaf node labelled with L (the symbol for nontermination). 

. A leaf node labelled with x where x € X. 

. A node labelled ø with children to,...,t,-1, when ø € X has arity a” > a. 

. A node labelled ø with children tg, t,,..., when o € X has arity aN > a. 

. A node labelled om where m € N with children to,...,t,-1, when ø € X has 
arity N x a” > q. 

. A node labelled om where m € N with children to, t1,..., when o € X has 


arity N x aN > a. 


orRWN EH 


aD 
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We write TX for the set of trees over X. We define a partial ordering on 
TX where tı < t2, if tı can be obtained by replacing subtrees of tg by L. 
This forms an w-complete partial order, meaning that every ascending sequence 
tı < t2 <... has a least upper bound [],, tn. Let Tree(r) := T Val(r), we will 
define a reduction relation from computations to trees of values. 

Given f : X > Y and a tree t € TX, we write t| + f(x)] € TY for the tree 
whose leaves x € X are renamed to f(x). We have a function uw: TTX —> TX, 
which takes a tree r of trees and flattens it to a tree ur € TX, by taking the 
labelling tree at each non- leaf of r as the subtree at the corresponding node 
in ur. The function p is the multiplication associated with the monad structure 
of the T operation. The unit of the monad is the map 7: X — TX which takes 
an element x € X and returns a leaf labelled zx. 

The operational mapping from a computation M € Com(r) to an effect tree 
is defined intuitively as follows. Start evaluating the M in the empty stack id, 
until the evaluation process (which is deterministic) terminates (if this never 
happens the tree is L). If the evaluation process terminates at a configuration 
of the form (id, return V) then the tree is the leaf V. Otherwise the evaluation 
process can only terminate at a configuration of the form (S,o(...)) for some 
effect operation o € X. In this case, create an internal node in the tree of the 
appropriate kind (depending on g) and continue generating each child tree of this 
node by repeating the above process by evaluating an appropriate continuation 
computation, starting from a configuration with the current stack S. 

The following (somewhat technical) definition formalises the idea outlined 
above in a mathematically concise way. We define a family of maps |—,—|(_) : 
Stack(r, p) x Com(r) x N > Tree(p) indexed over 7, and p by: 


|S, Mo =t 
V if S = id^ M = return V 
|S’, M'|n if (S, M) > (S’, M’) 
a(|S, Mo|n,---5 |S; Mm-iln)  o:a™—a, M=o(Mo,...,Mm-1) 
IS, M lert = o([S,VO|n,|S,V1|n,.--) o:a —>a, M =0(V) 


ar(|S, Moln,- --, |S, Mm-1ln) o:Nxa™—-a,M=a(k, Mo,..., Mm-1) 
or(|S,VO|n,|S,V1|n,---) o:NxaN >a, M =0(k, V) 
all otherwise 


It follows that |S,M|n < |S,M|n4i in the given ordering on trees. We write 
| — |(-) : Com(r) x N > Tree(r) for the function defined by |M|, = |id, M|n. 
Using this we can give the operational interpretation of computation terms as 
effect trees by defining | — |: Com(r) — Tree(r) by |M] :=[],, |M]n- 


Example 3 (Nondeterminism). Nondeterministically generate a natural number: 


?N := let fix(Av: 1 — N. or(Ay: 1. Z, Ay: 1. let zy > z in S(z))) > w in wx 
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3 Behavioural Logic and Modalities 


The goal of this section is to motivate and formulate a logic for expressing 
behavioural properties of programs. In our language, program means (well-typed) 
term, and we shall be interested both in properties of computations and in prop- 
erties of values. Accordingly, we define a logic that contains both value formulas 
and computation formulas. We shall use lower case Greek letters ¢,w,... for the 
former, and upper case Greek letters 6,W,... for the latter. Our logic will thus 
have two satisfaction relations 


VE MES 


which respectively assert that “value V enjoys the value property expressed by 
ġo” and “computation M enjoys the computation property expressed by &”. 

In order to motivate the detailed formulation of the logic, it is useful to 
identify criteria that will guide the design. 


(C1) The logic should express only ‘behaviourally meaningful’ properties of 
programs. This guides us to build the logic upon primitive notions that have 
a direct behavioural interpretation according to a natural understanding of 
program behaviour. 

(C2) The logic should be as expressive as possible within the constraints 
imposed by criterion (C1). 


For every type T, we define a collection VF(r) of value formulas, and a 
collection CF(r) of computation formulas, as motivated above. 

Since boolean logical connectives say nothing themselves about computa- 
tional behaviour, it is a reasonable general principle that ‘behavioural proper- 
ties’ should be closed under such connectives. Thus, in keeping with criterion 
(C2), which asks for maximal expressivity, we close each set CF(r) and VF(r), 
of computation and value formulas, under infinitary propositional logic. 

In addition to closure under infinitary propositional logic, each set VF(r) 
contains a collection of basic value formulas, from which compound formulas 
are constructed using (infinitary) propositional connectives.' The choice of basic 
formulas depends on the type 7. 


1 We call such formulas basic rather than atomic because they include formulas such 
as (V + ®), discussed below, which are built from other formulas. 
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In the case of the natural numbers type, we include a basic value formula 
{n} € VF(N), for every n € N. The semantics of this formula are given by: 


VE {n} e Vs=n. 


By the closure of VF(N) under infinitary disjunctions, every subset of N can be 
represented by some value formula. Moreover, since a general value formula in 
VF(N) is an infinitary boolean combination of basic formulas of the form {n}, 
the value formulas represent exactly the subsets on N. 

For the unit type, we do not require any basic value formulas. The unit type 
has only one value, *. The two subsets of this singleton set of values are defined 
by the formulas L (‘falsum’, given as an empty disjunction), and T (the truth 
constant, given as an empty conjunction). 

For a function type T — p, we want each basic formula to express a funda- 
mental behavioural constraint on values (i.e., A-abstractions) W of type T —> p. 
In keeping with the applicative nature of functional programming, the only way 
in which a A-abstraction can be used to generate behaviour is to apply it to an 
argument of type 7, which, because we are in a call-by-value setting, must be 
a value V. The application of W to V results in a computation WV of type p, 
whose properties can be probed using computation formulas in CF(p). Based on 
this, for every value V € Val(r) and computation formula  € CF(p), we include 
a basic value formula (V + ®&) € VF(r = p) with the semantics: 


WE (VHS) & WV EGS. 


Using this simple construct, based on application to a single argument V, other 
natural mechanisms for expressing properties of A-abstractions are definable, 
using infinitary propositional logic. For example, given ¢ € VF(r) and W € 
CF(p), the definition 


($= P) = MOU = P) |V € Val(r),V E o (1) 


defines a formula whose derived semantics is 


WE (dV) & WeEValr).V }| ¢ọ implies WV E YW. (2) 


In Sect. 7, we shall consider the possibility of changing the basic value formulas 
in VF(T —> p) to formulas (@ > W). 

It remains to explain how the basic computation formulas in CF(r) are 
formed. For this we require a given set O of modalities, which depends on the 
algebraic effects contained in the language. The basic computation formulas in 
CF(r) then have the form o¢, where o € O is one of the available modalities, 
and ¢ is a value formula in VF(7r). Thus a modality ‘lifts’ properties of values of 
type 7 to properties of computations of type T. 

In order to give semantics to computation formulas o¢, we need a general 
theory of the kind of modality under consideration. This is one of the main 
contributions of the paper. Before presenting the general theory, we first consider 
motivating examples, using our running examples of algebraic effects. 
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Example 0 (Pure functional computation). Define O = {|}. Here the single 
modality | is the termination modality: | ¢ asserts that a computation termi- 
nates with a return value V satisfying ¢. This is formalised using effect trees: 


ME lọ © |MlisaleafV andV E= @. 


Note that, in the case of pure functional computation, all trees are leaves: either 
value leaves V, or nontermination leaves L. 


Example 1 (Error). Define O = {|} U {Ee | e € E}. The semantics of the 
termination modality | is defined as above. The error modality Ee flags error e: 


M j} Eep <+ |M| is anode labelled with raise. 


(Because raise, is an operation of arity 0, a raise. node in a tree has 0 children.) 
Note that the semantics of Eep makes no reference to @. Indeed it would be 
natural to consider Ee as a basic computation formula in its own right, which 
could be done by introducing a notion of 0-argument modality, and considering 
Ee as such. In this paper, however, we keep the treatment uniform by always 
considering modalities as unary operations, with natural 0-argument modalities 
subsumed as unary modalities with redundant argument. 


Example 2 (Nondeterminism). Define O = {, O} with: 


M — 0¢ < |M|has some leaf V such that V = ¢ 
M — Od < |M] has finite height and every leaf is a value V s.t. V H 9. 


Including both modalities amounts to a neutral view of nondeterminism. In the 
case of angelic nondeterminism, one would include just the 0 modality; in that of 
demonic nondeterminism, just the O modality. Because of the way the semantic 
definitions interact with termination, the modalities O and ¢ are not De Morgan 
duals. Indeed, each of the three possibilities {, O}, {0}, {0O} for O leads to a 
logic with a different expressivity. 


Example 3 (Probabilistic choice). Define O = {P> |q EQ, 0 < q < 1} with: 


M |} Ps,¢ © _ P(|M| terminates with a value in {V | V =| $})>4, 


where the probability on the right is the probability that a run through the 
tree |M], starting at the root, and making an independent fair probabilistic 
choice at each branching node, terminates at a value node with a value V in the 
set {V | V H @}. We observe that the restriction to rational thresholds q is 
immaterial, as, for any real r with 0 < r < 1, we can define: 


Ps, := Vi{Psa¢| EQ r<q<l}. 


Similarly, we can define non-strict threshold modalities, for 0 < r < 1, by: 


Poro = A\f{Poo¢|G€Q,0<q <r}. 
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Also, we can exploit negation to define modalities expressing strict and non-strict 
upper bounds on probabilities. Notwithstanding the definability of non-strict and 
upper-bound thresholds, we shall see later that it is important that we include 
only strict lower-bound modalities in our set O of primitive modalities. 


Example 4 (Global store). For a set of locations L, define the set of states by 
State = N}. The modalities are O = {(s >= r) | s,r € State}, where informally: 


ME (s—r)@ <& the execution of M, starting in state s, terminates in 
final state r with return value V such that V = 4. 


We make the above definition precise using the effect tree of M. Define 
exec: TX x State — X x State, 
for any set X, to be the least partial function satisfying: 


(x, s) if t is a leaf labelled with z € X 
exec(t, 8) = 4 exec(ts(i), 5) if t = lookupy(to,ti,---) and exec(t,(1), 8)is defined 
exec(t’,s[l:=n]) ift = updatez,,(t’) and exec(t’, s[l := n]) is defined, 


where s|l := n] is the evident modification of state s. Intuitively, exec(t, s) defines 
the result of “executing” the tree of commands in effect tree t starting in state 
s, whenever this execution terminates. In terms of operational semantics, it can 
be viewed as defining a ‘big-step’ semantics for effect trees (in the signature of 
global store). We can now define the semantics of the (s — r) modality formally: 


M } (s-r)d & exec(|\M|,s)=(V,r) where V H ¢. 


Example 5 (Input/output). Define an i/o-trace to be a word w over the alphabet 
{?n|neEN}U{In| ne N}. 


The idea is that such a word represents an input/output sequence, where ?n 
means the number n is given in response to an input prompt, and !n means that 
the program outputs n. Define the set of modalities 


O = {(w)], (w)... | w an i/o-trace}. 
The intuitive semantics of these modalities is as follows. 


M } (w) < wisa complete i/o-trace for the execution of M 


resulting in termination with V s.t. V H ¢ 


M } (w). < wisan initial i/o-trace for the execution of M. 


In order to define the semantics of formulas precisely, we first define relations 
t H (w)| Pandt = (w).., between t € TX and P C X, by induction on words 
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neN j Vit Pe CHP) ty p € VF(r) GEO ay 
{n} € VF(N) | ) (V => 8) € VF(T > p) od € CF(T) 
ọ:I—> VE(T) ġ:I— VF(r) p E VE(T) 
VAEVA AidevAn” peve À 
@: I + CF(r) @®:I + CF(r) p e CF(r) 
Vsem O Reem) =wecrn” 


Fig. 2. The logic V 


(Note that we are overloading the = symbol.) In the following, we write € for 
the empty word, and we use textual juxtaposition for concatenation of words. 


t H (JP & tisa leafzr and ze P 
t = ((n)w)|P & t=read(to,ti,...) and tn H (w)| P 
t H (in)wlP = t=write,(t’) and t = (w) P 
t H (ce). & true 
t H ((?n)w). < t= read(to,ti,...) and tn = (w)... 


t = ((In)w). & t=write,(t’) and t & (w)... 


The formal semantics of modalities is now easily defined by: 


MF wo e |M| E (wL{VIV E ¢} 
M = (w)..6 & |M| E (w)... 


Note that, as in Example 1, the formula argument of the (w)... modality is redun- 
dant. Also, note that our modalities for input/output could naturally be formed 
by combining the termination modality |, which lifts value formulas to computa- 
tion formulas, with sequences of atomic modalities (?n) and (!n) acting directly 
on computation formulas. In this paper, we do not include such modalities, act- 
ing on computation formulas, in our general theory. But this is a natural avenue 
for future consideration. 


We now give a formal treatment of the logic and its semantics, in full gener- 
ality. We assume given a signature X of effect operations, as in Sect. 2. And we 
assume given a set O, whose elements we call modalities. 

We call our main behavioural logic V, where the letter V is chosen as a 
reference to the fact that the basic formula at function type specifies function 
behaviour on individual value arguments V. 


Definition 2 (The logic V). The classes VF(r) and CF(r) of value and com- 
putation formulas, for each type T, are mutually inductively defined by the rules 
in Fig. 2. In this, J can be instantiated to any set, allowing for arbitrary conjunc- 
tions and disjunctions. When I is Ø, we get the special formulas T = Ag and 
L = Vg. The use of arbitrary index sets means that formulas, as defined, form 
a proper class. However, we shall see below that countable index sets suffice. 
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In order to specify the semantics of modal formulas, we require a connection 
between modalities and effect trees, which is given by an interpretation function 


Į]: © > P(TD). 


That is, every modality o € O is mapped to a subset [o] C T1 of unit-type effect 
trees. Given a subset P C X (e.g. given by a formula) and a tree t € TX we can 
define a unit-type tree t[e P] € T1 as the tree created by replacing the leaves of 
t that belong to P by * and the others by L. In the case that P is the subset 
{V |V _ |H 9} specified by a formula ¢ € VF(r), we also write t|} ¢] for te P]. 

We can now formally define the two satisfaction relations = C Val(r) x VF(r) 
and = C Com(r) x CF(r), mutually inductively, by: 


m H {n} & m=n 

WE (Vr &) & WV ES 
MF 0g & |M|[E ¢] € [fol 
WE- = (WE @). 


We omit the evident clauses for the other propositional connectives. We remark 
that all conjunctions and disjunctions are semantically equivalent to countable 
ones, because value and computation formulas are interpreted over sets of terms, 
Val(r) and Com(r), which are countable. 

We end this section by revisiting our running examples, and showing, in each 
case, that the example modalities presented above are all specified by suitable 
interpretation functions [-] : O — P(T1). 


Example 0 (Pure functional computation). We have O = {|}. Define: 
[J] = {*«} (where » is the tree with single node *) 
Example 1 (Error). We have O = {|} U{E, | e € E}. Define: 
[Ee] = { raise, }. 
Example 2 (Nondeterminism). We have O = {0, O}. Define: 


IO] = {t| t has some * leaf} 
[O] = {t| t has finite height and every leaf is a *}. 


Example 3 (Probabilistic choice). O = {P~4 |q E Q, 0 < q < 1}. Define: 
[P>] = {t| P(t terminates with a * leaf) > q}. 
Example 4 (Global store). O = {(s — r) | s,r € State}. Define: 
[(s 9] = {¢ | execlt, 8) = (*,7)} 
Example 5 (Input/output). O = {(w)]|, (w)... | w an i/o-trace}. Define: 


[wl] = {elt = (wl tt} 
[(w)..] = {tlt Fw)... 


II 
l 
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4 Behavioural Equivalence 


The goal of this section is to precisely formulate our main theorem: under suitable 
conditions, the behavioural equivalence determined by the logic V of Sect. 3 is 
a congruence. In order to achieve this, it will be useful to consider the positive 
fragment V* of V. 


Definition 3 (The logic V+). The logic V* is the fragment of V consisting of 
those formulas in VF(r) and CF(r) that do not contain negation. 


Whenever we have a logic £ whose value and computation formulas are given 
as subcollections VFe(T) C VF(r) and CFe(r) C CF(r), then £ determines 
a preorder (and hence also an equivalence relation) between terms of the same 
type and aspect. 


Definition 4 (Logical preorder and equivalence). Given a fragment £ of 
VY, we define the logical preorder Er, between well-typed terms of the same type 
and aspect, by: 


VELW & VOEVF(T),VES6 SWE SO 
ME:N & VYV®ECF,(r),, MES SNE ® 


The logical equivalence =¢ on terms is the equivalence relation induced by the 
preorder (the intersection of Eg and its converse). 


In the case that formulas in £ are closed under negation, it is trivial that the 
preorder Er is already an equivalence relation, and hence coincides with =r. 
Thus we shall only refer specifically to the preorder Er, for fragments, such as 
Yt, that are not closed under negation. 

The two main relations of interest to us in this paper are the primary rela- 
tions determined by V and VY: full behavioural equivalence =y; and the positive 
behavioural preorder Cy+ (which induces positive behavioural equivalence =y+). 

We next formulate the appropriate notion of (pre)congruence to apply to the 
relations =y and Ey+. These two preorders are examples of well-typed relations 
on closed terms. Any such relation can be extended to a relation on open terms 
in the following way. Given a well-typed relation R on closed terms, we define the 
open extension R° where IT F MR°N : T precisely when, for every well-typed 
vector of closed values V : T , it holds that M IV] RN [V]. The correct notion 
of precongruence for a well-typed preorder on closed terms, is to ask for its open 
extension to be compatible in the sense of the definition below; see, e.g., [10,19] 
for further explanation. 


Definition 5 (Compatibility). A well-typed open relation R is said to be 
compatible if it is closed under the rules in Fig. 3. 


We now state our main congruence result, although we have not yet defined 
the conditions it depends upon. 
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CTEVRV':N 
Tje:tThKaRe:t TrZRZ:N CES(V)RS(V'):N 


TEVRV':7 T,x:T-+ MRM’: p 
Ct} return(V) Rreturn(V’) : 7 Db (At: 7.M)R (Ar: 7.M'):7 >p 


CTEKVRV':t>p TRWRW':7 CTEVRV': (7 > p) > (7 > p) 
TE(VW)R(V'W’): p Tt fix(V) Rfix(V’): 7 >p 
CTEVRV':N CTEMRM':7 Tc:NENRN':7 

Tr- case V of {Z > M; S(x) > N}R case V’ of {Z > M'; S(x) > N'}: 7 


TEMRM':7r I,c:tT-ENRN':p 
IrFlet M >rzin NRlet M'=>-2 in N’: p 
TEM:RM): 7 TEVRV':N TEM:RM):7 
TF o(Mo, Mi,...)Ro(Mj,Mj,...):7 Eb o(V;Mo,Mi,...)Ro(V'; Mg, Mj,...) 27 


CTRFVRV': Nor CTFEVRV':N TFWRW':No>Tr 
Tra(V)Roa(V'): 7 DProaV;W)Ro(V’;W') :7 


Fig. 3. Rules for compatibility 


Theorem 1. If O is a decomposable set of Scott-open modalities then the open 
extensions of =y and Ey+ are both compatible. (It is an immediate consequence 
that the open extension of =y+ is also compatible.) 


The Scott-openness condition refers to the Scott topology on T1. 
Definition 6. We say that o € O is upwards closed if |o] is an upper-closed 
subset of T1; i.e., if t € |o] implies ¢’ € [o] whenever t < 1’. 


Definition 7. We say that o € O is Scott-open if |o] is an open subset in the 
Scott topology on T1; i.e., |o] is upper closed and, whenever tı < tg <... is an 
ascending chain in T1 with supremum Lt; € [o], we have tn € fo] for some n. 


Before formulating the property of decomposability, we make some simple 
observations about the positive preorder Cy+. 


Lemma 8. For any Vo, Vı € Val(p > T), we have Vo Ey+ Vi if and only if: 
YW € Val(p), WY € CFy-+(T), Vo H (W = Y) implies Vu H (W = Y). 


Lemma 9. For any Mo, Mı E€ Com(r), we have Mo Ey+ Mi if and only if: 


Vo € O, Yọ € VFy-+(T), Mo = o¢ implies M, = od. 


Similar characterisations, with appropriate adjustments, hold for behavioural 
equivalence =y. 

The decomposability property is formulated using an extension of the positive 
preorder Cy+, at unit type, from a relation on computations to a relation on 
arbitrary effect trees. Accordingly, we define a preorder < on T1 by: 


tst & YoeEO, (te fo] st’ € [o]) A (tle 4 € fo] > tfe t] € lo). 
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Proposition 10. For computations M,N € Com(1), it holds that |M| < |N] if 
and only if M Ey+ N. 


Proof. The defining condition for |M| < |N| unwinds to: 


Voc O, (M FE oT implies NE oT) A (M } ol implies N } ol). 


This coincides with M Cy+ N by Lemma 9. 


We now formulate the required notion of decomposability. We first give the 
general definition, and then follow it with a related notion of strong decompos- 
ability, which can be more convenient to establish in examples. Both definitions 
are unavoidably technical in nature. 

For any relation R C X x Y and subset A C X, we write R'A for the right 
set {y € Y | 3x € A,xRy}. This allows use to easily define our required notion. 


Definition 11 (Decomposability). We say that O is decomposable if, for all 
r,r’ € TT1, we have: 


(VA C T1, rle A] < r'e <'A]) > przpr. 


Corollary 22 in Sect. 5, may help to motivate the formulation of the above prop- 
erty, which might otherwise appear purely technical. The following stronger ver- 
sion of decomposability, which suffices for all examples considered in the paper, 
is perhaps easier to understand in its own right. 


Definition 12 (Strong decomposability). We say that O is strongly decom- 
posable if, for every r € TT1 and o € O for which pr € [o], there exists a 
collection {(0;, 04) }ier of pairs of modalities such that: 


1. Vie I, r[e foj]] € [o:]; and 
2. for every r’ € TT1, (Vi € I, r'[e [of] € Joi] ) implies pr’ € [o]. 


Proposition 13. If O is a strongly decomposable then it is decomposable. 


Proof. Suppose that r[e A] < r'[e (=! A)] holds for every A C T1. Assume that 
ur € [o] € O. Then strong decomposability gives a collection {(0;, 0;)}r. By the 
definition of <, for each of we have =! [o4] = [o]. By the initial assumption, 
rje [o] € [oi] implies r’[e (<? [o/])] € [oi], and hence r’[e [o:]] € foi]. This 
holds for every i, so by strong decomposability pur’ € Jo]. We have shown that 
ur € [o] implies ur’ € [o]. One can prove similarly that ur|e Ø] € [o] implies 
that ur'[e Ø] € [o] by observing that <° {x | az[e 0] € [o]} = {z | z[e 0 € [o]. 
Thus it holds that ur < ur’ and hence O is decomposable. 


We end this section by again looking at our running examples, and showing, 
in each case, that the identified collection O of modalities is Scott-closed (hence 
upwards closed) and strongly decomposable (hence decomposable). For any of 
the examples, upwards closure is easily established, so we will not show it here. 
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Example 0 (Pure functional computation). We have O = {|} and [|] = {*«}. 
Scott openness holds since if L;t; = * then for some i we must already have 
ti = x. It is strongly decomposable since: ur € [J] = rle [J] € LL], which 
means r returns a tree t which is a leaf x. 


Example 1 (Error). We have O = {|} U{E. | e € E} and [Ee] = { raise, }. 
Scott-openness holds for both modalities for the same reason as in the previous 
example, and its strongly decomposable since: 


umrel & rieel 


Which means r returns a tree t which returns *. 


pr € [E] <= r[e [E-]] € [Ee] vrfe [E-]] € [U]. 
Which means r raises an error, or returns a tree that raises an error. 


Example 2 (Nondeterminism). We have O = {09, O}. The Scott-openness of 
IO] = {t | t has some * leaf} is because if Lijt; has a x leaf, then that leaf 
must already be contained in t; for some i. Similarly, if U;t; € [O] then, because 
[O] = {t |t has finite height and every leaf is ax}, the tree U;t; has finitely many 
leaves and all must be contained in t; for some i. Hence t; € [O]. Strong decom- 
posability holds because: 


pre [Oo] & rle [ol] € [0] and pre [O] ele [Of] e OI). 


The right-hand-side of the former states that r has as a leaf a tree t, which itself 
has a leaf x. That of the latter states that r is finite and all leaves are finite trees 
t that have only * leaves. The same arguments show that {} and {0} are also 
decomposable sets of Scott open modalities. 


Example 3 (Probabilistic choice). O = {Psq | q E Q,0 < q < 1}. For the 
Scott-openness of [P>4] = {t | P(t terminates with a * leaf) > q}, note that 
P(LU;t; terminates with ax leaf ) is determined by some countable sum over the 
leaves of t;. If this sum is greater than a rational q, then some finite approxima- 
tion of the sum must already be above q. The finite sum is over finitely many 
leaves from Lj;t;, all of which will be present in t; for some i. Hence t; € [P>4]. 

We have strong decomposability, since P( ur terminates with ax leaf ) equals 
the integral of the function f,(x) = sup{y € [0,1] | r{[Ps2]] € [Psy] } from [0,1] 
to [0, 1]. Indeed, f,(a) gives the probability that r return a tree t € [Ps,]]. So we 
know that if Vz, y,r[[Psz]] € [Psy] > r’[[Psz]] € [Psy], then f- (x) > f(z) 
for any x. Hence if wr € [Psq] then f fr > q, whence also f fe > q, which 
means pr’ € [P>4]. 


Example 4 (Global store). We have O = {(s — s’) | s,s’ € State}. For the Scott- 
openness of [(s => s’)] = {t | exec(t,s) = (*,r)}, note that if exrec(U;ti, s) = 
(*, s’), there is a single finite branch of t that follows the path the recursive 
function exec took. This branch must already be contained in t; for some 7. We 
also have strong decomposability since: 


uréls—s'] & As” € State, rje Is” = s‘]] € [s > 8”]. 


Which just means that erec(r, s) = (t, s”) and exec(t, s”) = (x, s’) for some s”. 
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Example 5 (Input/output). We have O = {(w)]|, (w)... | w an i/o-trace}. For the 
Scott-openness of [(w)|] = {t |t H| (w)l {*}}, note that the i/o-trace (w) | 
is given by some finite branch, which if in U;t; must be in t; for some i. The 
Scott-openness of |(w)...] = {t|¢ FE (w)... } holds for similar reasons. We have 
strong decomposability because of the implications: 


ure Kw] =  Av,u i/o-traces, vu = w A rfe [Kw] € [oN]. 
Which means r follows trace v returning t, and t follows trace u returning *. 
ur € Kw)... ] & rje N] € Kw.. -] V av, u, vu = w rje [(u).. J] E [Cod]. 


Which means either r follows trace w immediately, or it follows v returning a 
tree that follows u. 


5 Applicative O-(bi)similarity 


In this section we look at an alternative description of our logical pre-order. 
Central to such a definition lies the concept of a relator [12,25], which we use 
to lift a relation on value terms to a relation on computation terms. With our 
family of modalities O we can define a relator which takes a relation R C X x Y 
and returns the relation O(R) C TX x TY, defined by: 


tO(R)t' & YACX,Yoc O, tle A] € Jo] => te (RIA) € fol. 


Note that O(id1) = (<). Following [9], we use this relation-lifting operation to 
define notions of applicative similarity and bisimilarity. 


Definition 14. An applicative O-simulation is given by a pair of relations R? 
and RE for each type T, where R? C Val(r)? and RÈ C Com(r)?, such that: 

1. VRAW > (V=W) 

2. MREN > |M| O(R?) |N] 

3. VRY „W = We Val(p), VU R WU 


por 
Applicative O-similarity is the largest applicative O-simulation, which is equal 
to the union of all applicative O-simulations. 


Definition 15. An applicative O-bisimulation is a symmetric O-simulation. 
The relation of O-bisimilarity is the largest applicative O-bisimulation. 


Lemma 16. Applicative O-bisimilarity is identical to the relation of applicative 
(ON O° )-similarity, where (C(O NO )(R)r = tO(R)r A rO(RP)t. 


Proof. Let R be the O-bisimilarity, then by symmetry we have RP = R. So if 
MRN we have NRM, and by the simulation rules we derive |M|O(R)|N| and 
|N|O(R)|M| which is what we needed. 

Let R be the O N O°-similarity. If MRP N then |N|(O N O°)(R)|M| so 
|IN|O(R)|M| A |M|O(R°)|N| which results in |M|(O NOP) (RP)|N|. Verifying 
the other simulation conditions as well, we can conclude that the symmetric 
closure R U R is also a OM O°?-simulation. So R must, as the largest such 
simulation, be symmetric. Hence R is a symmetric O-simulation as well. 
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For brevity, we will leave out the word “applicative” from here on, and write 
o to mean its denotation [o]. We also introduce brackets, writing o[¢] for o¢. 
The key result now is that the maximal relation, the O-similarity is in most 
cases the same object as our logical preorder. We first give a short Lemma. 


Lemma 17. For any fragment L of V closed under countable conjunction, it 
holds that for each value V there is a formula xy E L s.t. W Ec xv © 
VEs W. 


Proof. For each U such that (V Ze U), choose a formula œY € £ such that 
V Kc #” and (UK Y). Then if we define xy := Nwivgeu} ¢” it holds that 
V Zc U & U j xv, which is what we want. 


Theorem 2 (a). For any family of upwards closed modalities O, we have that 
the logical preorder Ly+ is identical to O-similarity. 


Proof. We write C instead of C+ to make room for other annotations. We first 
prove that our logical preorder E is an O-simulation by induction on types. 


1. Values of N. If n CX m, then since n — {n} we have that m H {n}, hence 
m=n. 

2. Computations of 7. Assume M ES N, we prove that |M|O(E2)|N|. Take 
AC Val(r) and o € O such that |M|[€ A] € o. Taking the following formula 
Q := Vaca Xa (where Xa as in Lemma 17), then b =| ¢ọ & Ja € A, a C? b and 
a€ A=a E¢.So|M|[E ¢] > |M|[€ A], hence since o is upwards closed, 
|M\[E ¢] € o. By M CS N we have |N|[e {b € Val(T)|Ja € A,a E? DS] = 
|N|[E ¢] € o. Hence we can conclude that |M|O(EY)|N]. 

3. Function values of p — 7, this follows from Lemmas and the Induction 
Hypothesis. 


We can conclude that E is an O-simulation. Now take an arbitrary -simulation 
R. We prove by induction on types that R C (E). 


1. Values of N. If VRKW then V = W, hence by reflexivity we get V EX W. 

2. Computations of Tr. Assume MRN, we prove that M ES N using the char- 
acterisation from Lemma 9. Say for o € O and ¢ € VF(T) we have M } ofd]. 
Let Ag := {a € Val(T)|a H| ġ} C Val(r), then |M|[e Ag] = |M|[E ¢] € 0 
hence by MRN we derive |N|[e {b € Val(r)|da € Ay, aRvb}] € o. By 

Induction Hypothesis on values of 7, we know that RY C (EY), hence 

‘Ja E€ Ag,aR2b implies b |} 9. We get that |N|[E 4] > |Nl[e {b € 
Val(r) | Ja € Ay, aR2%b}], so by upwards closure of o we have |N|[F @] €o 
meaning N = o|¢]. We conclude that M E$ N. 

3. Function values of p > 7, assume VR>_,-W. We prove V Cý, W using the 
characterisation from Lemma8. Assume V |= (U +> ®) where U € Val(p) 
and @ € CF(r),so VU |} 8. By VRZ,- W we have VU R WU and by 
Induction Hypothesis we have R C (E$), so VU CS WU. Hence WU j} ® 
meaning W |} (U => ®). We can conclude that V C?_, W. 


=p-T 
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4. Values of 1. If VRIW then V = * = W hence V CY W. 


In conclusion: any O-simulation R is a subset of the O-simulation Ey+. So Ey+ 
is O-similarity. 


Alternatively, we can look at the variation of our logic with negation. This 
is related to applicative bisimulations. 


Theorem 2 (b). For any family of upwards closed modalities O, we have that 
the logical equivalence =y is identical to O-bisimilarity. 


Proof. Note first that =y is symmetric. 

Secondly, note that since =)=Ly we know by Lemma 17, that for any V, 
there is a formula yy such that W = xv € V =p W. 

Using these special formulas yy, the rest of the proof is very similar to the 
proof in Theorem 2(a). Here follow the non-trivial parts of the proof, different 
from the previous lemma. For proving =y is an O-simulation: 


1. Computations of r. Assume M =$ N and |M|[e A] € o € O. Then M H 
oVyeaXv] hence N |H o[Vye,4xv] meaning |N|[e {W |3V € A, V = 
WY}. So |M|O(=2)|N]. 


2. Functions of p > 7, if V =, W and U € Val(p). If VU = &, then 
V = Uw hene W H U m so WU FE G@. Same vice versa, so 
VU = WU. 


So =y is an O-bisimulation. Now take any O-bisimulation R. 


1. Computations of 7, if MRN and M - ofd] then |M|[E ¢] € o hence 
IN|[E {W|AV H ¢,VR2W}] € o. By Induction Hypothesis, (RY) C (=”) 
so {W|3V H 6,VR°W} C {W]|3V H ¢,V =? W}. So by upwards 
closure of o we get that |N|[e {W |3V — ¢,V =? W}] € o and further that 
N H old}. We can conclude M =y N. 

2. Values of p > T, if VRW and V = U = £, then VU — @and VU RWU 
hence by Induction Hypothesis, VU = WU meaning WU = so W = Ure 
p. If V H ~(U + 8) then “(VU | ®) hence by VU = WU we have 
“(WU H= &®)so W H| W7(U + @®). For the V and A constructors, a 
simple Induction Step would suffice, and for higher level negation note that 


Vb Ang and ~A ¢ S V 74. 
We can conclude that (R) C (=y), so =y is indeed O-bisimilarity. 


We end this section by stating the abstract properties of our relational lifting 
O(R) required for the proof by Howe’s method in Sect.6 to go through. The 
necessary properties were identified in [9]. The contribution of this paper is that 
all the required properties follow from our modality-based definition of O(R). 
The first set of properties tell us that O(—) is a relator in the sense of [12]: 


Lemma 18. If the modalities from O are upwards closed, then O(—) is a relator, 
meaning that: 
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. IJR C X x X is reflexive, then so is O(R). 

. YR,YS, O(R)O(S) C O(RS), where RS is relation composition. 

. YR,YS, RCS>O(R)CO(S). 

. Vf: X—>Z,g:Y>W,RCZxW,O((f x DIR) = (TH x Tg) OR) 
where (f x g)~*(R) = {(2,y) € X x Y | f(x)Rag(y)}. 


The next property together with the previous lemma establishes that O(—) is a 
monotone relator in the sense of [25]. 


Mw ww 


Lemma 19. If the modalities from O are upwards closed, then O(—) is mono- 
tone, meaning for any f: X > Z,g:Y >W, RCXxY andSCZxW: 


(Yx, y, Ry = f(x) S g(y)) AtO(R)r = tle > f(x) O(S) rly > g(y)] 
The relator also interacts well with the monad structure on T. 


Lemma 20. If O is a decomposable set of upwards closed modalities, then: 


1. Ry => n(x)O(R)n(y); 
2. tO(O(R))r => mO(R)ur. 


Finally, the following properties show that relator behaves well with respect to 
the order on trees. 


Lemma 21. If O only contains Scott open modalities, then: 


1. If R is reflexive, then t < r => tO(R)r. 
2. For any two sequences ug < uy < Ug <... and vo < vy < vg < 


Yn, (unO(R)vn) => (Gnun)O(R)(Unvn) 


The lemmas above list the core properties of the relator, which are satisfied 
when our family O is decomposable and contains only Scott open modalities. 
The results below follow from those above. 


Corollary 22. If O contains only upwards closed modalities, then: 
O is decomposable = VRCXxY,Vt,r € TT1, (tO(O(R))r > ut O(R) ur) 


Corollary 23. If O is a decomposable family of upwards closed modalities, then 
lifted relations are preserved by Kleisli lifting and effect operators: 


1. Gwen f : X > Z,9:Y ~W,REOXxY andS CZxW, if for all 
xz € X andy € Y we have cRy = f(x) O(S) g(y)) and if tO(R)r then 
(tle f(x)}) OCS) wry => g(y))) 

2. (Vk, upO(S)v~) => oluo, u1,...)O(S)o(vo, v1,---) 


Point 2 of Corollary 23 has been stated in such a way that it contains both the 
infinite arity case aN — a and the finite arity case a” — a. So it states that 
any lifted relation is preserved under any of the predefined algebraic effects. 
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6 Howe’s Method 


In this section, we apply Howe’s method, first developed in [5,6], to establish 
the compatibility of applicative (bi)similarity, and hence of the behavioural pre- 
orders. Given a relation R on terms, one defines its Howe closure R°, which is 
compatible and contains the open extension R°. Our proof makes fundamental 
use of the relator properties from Sect. 5, closely following the approach of [9]. 


Proposition 24. If O is a decomposable set of Scott open modalities, then for 
any O-simulation preorder LC, the restriction of its Howe closure C° to closed 
terms is an O-simulation. 


In the proof of the proposition, the relator properties are mainly used to show 
that E° satisfies condition (2) in Definition 14. 
We can now establish the compatibility of applicative O-similarity. 


Theorem 3 (a). If O is a decomposable set of Scott open modalities, then the 
open extension of the relation of O-similarity is compatible. 


Proof (sketch). We write E, for the relation of O-similarity. Since E, is an O- 
simulation, we know by Proposition 24 that C$ limited to closed terms is one 
as well, and hence is contained in the largest O-simulation E,. Since C? is 
compatible, it is contained in the open extension E$. We can conclude that C 


is equal to the Howe closure C$, which is compatible. 


To prove that O-bisimilarity is compatible, we use the following result from 
[10] (where we write S* for the transitive-reflexive closure of a relation S). 


Lemma 25. If R° is symmetric and reflexive, then R®* is symmetric. 


Theorem 3 (b). If O is a decomposable set of Scott open modalities, then the 
open extension of the relation of O-bisimilarity is compatible. 


Proof (sketch). We write O-bisimilarity as Cy. From Proposition 24 we know that 
E? on closed terms is an O-simulation, and so we know C3” is an O-simulation 
as well (using Lemma 18). Since Ey is reflexive and symmetric, we know by the 
previous lemma that £?* is symmetric. Hence E* is an O-bisimulation, implying 


(C$*) C (C?) by compatibility of C?*. Since (CP) C (CF) C (E?*) we have that 
(C$*) = (CP), and we can conclude that C? is compatible. 


Theorem 1 is an immediate consequence of Theorems 2 and 3. 


7 Pure Behavioural Logic 


In this section, we briefly explore an alternative formulation of our logic. This has 
both conceptual and practical motivations. Our very approach to behavioural 
logic, fits into the category of endogenous logics in the sense of Pnueli [24]. For- 
mulas (¢ and ®) express properties of individual programs, through satisfaction 
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relations (V |= and M j ®). Programs are thus considered as ‘models’ of 
the logic, with the satisfaction relation being defined via program behaviour. 

It is conceptually appealing to push the separation between program and logic 
to its natural conclusion, and ask for the syntax of the logic to be independent of 
the syntax of the programming language. Indeed, it seems natural that it should 
be possible to express properties of program behaviour without knowledge of the 
syntax of the programming language. Under our formulation of the logic V, this 
desideratum is violated by the value formula (V + W) at function type, which 
mentions the programming language value V. 

This issue can be addressed, by replacing the basic value formula (V + W) 
with the alternative (¢ +> W), already mentioned in Sect. 3. Such a change also 
has a practical motivation. The formula (¢ + W) declares a precondition and 
postcondition for function application, supporting a useful specification style. 


Definition 26. The pure behavioural logic F is defined by replacing rule (2) in 
Fig. 2 with the alternative: 


~ € VF(p) W € CF(r) 
(dW) Ee VF(p > 7) 


(2") 


The semantics is modified by defining V |= (¢ ++ W) using formula (2) of 
Sect. 3. 


Proposition 27. If the open extension of =y is compatible then the logics V 
and F are equi-expressive. Similarly, if the open extension of Ey+ is compatible 
then the positive fragments Vt and Ft are equi-expressive. 


Proof. The definition of (¢ + W) within V, given in (1) of Sect.3, can be used 
as the basis of an inductive translation from F to V (and from Ft to VT). 

For the reverse translation, whose correctness proof is more interesting, we 
give a little more detail. Every value/ computation formula, ¢/®, of V is induc- 
tively translated to a corresponding formula 6/ S of F. The interesting case is: 


(V=) := (Wy = Ê), 


where Yvy is a formula such that: V =F wy; and, for any w, if V =r w then 
py —> w (meaning that V’ =r Yy implies V’ F y, for all V’). Such a formula 
wy is easily constructed as a countable conjunction (cf. Lemma 17). One then 
proves, by induction on types, that the F- semantics ¢ of d (resp. P) coincides with 


the V-semantics of ¢ (resp. ®). In the case for (Vid ++ Ð), the induction hypothesis 
is used to establish that any V’ satisfying V’ =r wy enjoys the property that 
V’ =y V. It then follows from the compatibility of =) that WV’ =y WV, for 
any W of appropriate type, whence WV’ =+ WV. The rest of the proof can 
easily be erected around these observations. 
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Combining the above proposition with Theorem 1 we obtain the following. 


Corollary 28. Suppose O is a decomposable family of Scott-open modalities. 
Then =p coincides with =y, and Cr+ coincides with Cy+. Hence the open 
extensions of =F and Cr+ are compatible. 


We do not know any proof of the compatibility of the =~ and E+ relations 
that does not go via the logic V. In particular, the compatibility property of the 
fix operator seems difficult to establish directly for =~ and CE f+. 


8 Discussion and Related Work 


The behavioural logics considered in this paper are designed for the purpose 
of clarifying the notion of ‘behavioural property’, and for defining behavioural 
equivalence. As infinitary propositional logics, they are not directly suited to 
practical applications such as specification and verification. Nevertheless, they 
serve as low-level logics into which more practical finitary logics can be trans- 
lated. For this, the closure of the logics under infinitary propositional logic is 
important. For example, there are standard translations of quantifiers and least 
and greatest fixed points into infinitary propositional logic. Also, in the case of 
global store, Hoare triples translate into logical combinations of modal formulas. 

Our approach, of basing logics for effects on behavioural modalities, may 
potentially inform the design of practical logics for specifying and reasoning 
about effects. For example, Pitts’ evaluation logic was an early logic for general 
computational effects [18]. In the light of the general theory of modalities in the 
present paper, it seems natural to replace the built-in O and © modalities of 
evaluation logic, with effect-specific modalities, as in Sect. 3. 

The logic for algebraic effects, of Plotkin and Pretnar [23], axiomatises effect- 
ful behaviour by means of an equational theory over the signature of effect oper- 
ations, following the algebraic approach to effects advocated by Plotkin and 
Power [22]. Such equational axiomatisations are typically sound with respect to 
more than one notion of program equivalence. The logic of [23] can thus be used 
to soundly reason about program equivalence, but does not in itself determine 
a notion of program equivalence. Instead, our logic is specifically designed as 
a vehicle for defining program equivalence. In doing so, our modalities can be 
viewed as a chosen family of ‘observations’ that are compatible with the effects 
present in the language. It is the choice of modalities that determines the equa- 
tional properties that the effect operations satisfy. 

The logic of [23] itself makes use of modalities, called operation modalities, 
each associated with a single effect operations in X. It would be natural to 
replace these modalities, which are syntactic in nature, with behavioural modal- 
ities of the form we consider. Similarly, our behavioural modalities appear to 
offer a promising basis for developing a modality-based refinement-type sys- 
tem for algebraic effects. In general, an important advantage we see in the use 
of behavioural modalities is that our notion of strong decomposability appears 
related to the availability of compositional proof principles for modal properties. 
This is a promising avenue for future exploration. 
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A rather different approach to logics for effects has been proposed by Gon- 
charov, Mossakowski and Schröder [3,16]. They assume a semantic setting in 
which the programming language is rich enough to contain a pure fragment that 
itself acts as a program logic. This approach is very powerful for certain effects. 
For example, Hoare logic can be derived in the case of global store. However, it 
appears not as widely adaptable across the range of effects as our approach. 

Our logics exhibit certain similarities in form with the endogenous logic devel- 
oped in Abramsky’s domain theory in logical form [2]. Our motivation and app- 
roach are, however, quite different. Whereas Abramsky shows the usefulness of 
an axiomatic approach to a finitary logic as a way of characterising denotational 
equality, the present paper shows that there is a similar utility in considering an 
infinitary logic from a semantic perspective (based on operational semantics) as 
a method of defining behavioural equivalence. 

The work in this paper has been carried out for fine-grained call-by-value [13], 
which is equivalent to call-by-value. The definitions can, however, be adapted to 
work for call-by-name, and even call-by-push-value [11]. Adding type construc- 
tors such as sum and product is also straightforward. We have not checked the 
generalisation to arbitrary recursive types, but we do not foresee any problem. 

An omission from the present paper is that we have not said anything 
about contextual equivalence, which is often taken to be the default equiva- 
lence for applicative languages. In addition to determining the logically defined 
preorders/equivalences, the choice of the set O of modalities gives rise to a 
natural definition of contextual preorder, namely the largest compatible pre- 
order that, on computations of unit type 1, is contained in the < relation from 
Sect. 4. The compatibility of Ey+ established in the present paper means that 
we have the expected relation inclusions =y C Ey+ C Cetxt. It is an interesting 
question whether the logic can be restricted to characterise contextual equiva- 
lence/preorder. A more comprehensive investigation of contextual equivalence is 
being undertaken, in ongoing work, by Aliame Lopez and the first author. 

The crucial notion of modality, in the present paper, was adapted from the 
notion of observation in [8]. The change from a set of trees of type N (an observa- 
tion) to a set of unit-type trees (a modality) allows value formulas to be lifted to 
computation formulas, analogously to predicate lifting in coalgebra [7], which is a 
key characteristic of our modalities. Properties of Scott-openness and decompos- 
ability play a similar role the present paper to the role they play in [8]. However, 
the notion of decomposability for modalities (Definition 11) is more subtle than 
the corresponding notion for observations in [8]. 

There are certain limitations to the theory of modalities in the present paper. 
For example, for the combination of probability and nondeterminism, one might 
naturally consider modalities OP, and OP, asserting the possibility and neces- 
sity of the termination probability exceeding r. However, the decomposability 
property fails. It appears that this situation can be rescued by changing to a 
quantitative logic, with a corresponding notion of quantitative modality. This is 
a topic of ongoing research. 
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Abstract. As popularity of algebraic effects and handlers increases, so 
does a demand for their efficient execution. Eff, an ML-like language 
with native support for handlers, has a subtyping-based effect system 
on which an effect-aware optimizing compiler could be built. Unfortu- 
nately, in our experience, implementing optimizations for Eff is overly 
error-prone because its core language is implicitly-typed, making code 
transformations very fragile. 

To remedy this, we present an explicitly-typed polymorphic core cal- 
culus for algebraic effect handlers with a subtyping-based type-and-effect 
system. It reifies appeals to subtyping in explicit casts with coercions 
that witness the subtyping proof, quickly exposing typing bugs in pro- 
gram transformations. 

Our typing-directed elaboration comes with a constraint-based infer- 
ence algorithm that turns an implicitly-typed Eff-like language into our 
calculus. Moreover, all coercions and effect information can be erased in 
a straightforward way, demonstrating that coercions have no computa- 
tional content. 


1 Introduction 


Algebraic effect handlers [17,18] are quickly maturing from a theoretical model 
to a practical language feature for user-defined computational effects. Yet, in 
practice they still incur a significant performance overhead compared to native 
effects. 

Our earlier efforts [22] to narrow this gap with an optimising compiler from 
Eff [2] to OCaml showed promising results, in some cases reaching even the 
performance of hand-tuned code, but were very fragile and have been postponed 
until a more robust solution is found. We believe the main reason behind this 
fragility is the complexity of subtyping in combination with the implicit typing of 
Eff’s core language, further aggravated by the “garbage collection” of subtyping 
constraints (see Sect. 7).1 


1 For other issues stemming from the same combination see issues #11 and #16 at 
https: //github.com/matijapretnar /eff/issues/. 
© The Author(s) 2018 
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For efficient compilation, one must avoid the poisoning problem [26], where 
unification forces a pure computation to take the less precise impure type of the 
context (e.g. a pure and an impure branch of a conditional both receive the same 
impure type). Since this rules out existing (and likely simpler) effect systems 
for handlers based on row-polymorphism [8, 12,14], we propose a polymorphic 
explicitly-typed calculus based on subtyping. More specifically, our contributions 
are as follows: 


— First, in Sect. 3 we present IMPEFF, a polymorphic implicitly-typed calculus 
for algebraic effects and handlers with a subtyping-based type-and-effect sys- 
tem. IMPEFF is essentially a (desugared) source language as it appears in the 
compiler frontend of a language like Eff. 

— Next, Sect. 4 presents EXEFF, the core calculus, which combines explicit Sys- 
tem F-style polymorphism with explicit coercions for subtyping in the style of 
Breazu-Tannen et al. [3]. This calculus comes with a type-and-effect system, 
a small-step operational semantics and a proof of type-safety. 

— Section 5 specifies the typing-directed elaboration of IMPEFF into EXEFF and 
presents a type inference algorithm for IMPEFF that produces the elaborated 
EXEFF term as a by-product. It also establishes that the elaboration preserves 
typing, and that the algorithm is sound with respect to the specification and 
yields principal types. 

— Finally, Sect. 6 defines SKELEFF, which is a variant of EXEFF without effect 
information or coercions. SKELEFF is also representative of Multicore Ocaml’s 
support for algebraic effects and handlers [6], which is a possible compilation 
target of Eff. By showing that the erasure from EXEFF to SKELEFF preserves 
semantics, we establish that EXEFF’s coercions are computationally irrelevant 
and that, despite the existence of multiple proofs for the same subtyping, 
there is no coherence problem. To enable erasure, EXEFF annotates its types 
with (type) skeletons, which capture the erased counterpart and are, to our 
knowledge, a novel contribution. 

— Our paper comes with two software artefacts: an ongoing implementation? 
of a compiler from Eff to OCaml with EXEFF at its core, and an Abella 
mechanisation® of Theorems 1, 2, 6, and 7. Remaining theorems all concern 
the inference algorithm, and their proofs closely follow [20]. 


The full version of this paper includes an appendix with omitted figures and can 
be found at http://www.cs.kuleuven.be/publicaties/rapporten/cw/CW711.abs. 
html. 


2 Overview 


This section presents an informal overview of the EXEFF calculus, and the main 
issues with elaborating to and erasing from it. 


? https: //github.com/matijapretnar /eff/tree/explicit-effect-subtyping. 
3 https: //github.com/matijapretnar /proofs/tree/master /explicit-effect-subtyping. 
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2.1 Algebraic Effect Handlers 


The main premise of algebraic effects is that impure behaviour arises from a set of 
operations such as Get and Set for mutable store, Read and Print for interactive 
input and output, or Raise for exceptions [17]. This allows generalizing exception 
handlers to other effects, to express backtracking, co-operative multithreading 
and other examples in a natural way [2,18]. 

Assume operations Tick : Unit — Unit and Tock : Unit — Unit that 
take a unit value as a parameter and yield a unit value as a result. Unlike 
special built-in operations, these operations have no intrinsic effectful behaviour, 
though we can give one through handlers. For example, the handler {Tick x k + 
(Print “tick”; k unit),Tockxk + Print “tock”} replaces all calls of Tick by 
printing out “tick” and similarly for Tock. But there is one significant difference 
between the two cases. Unlike exceptions, which always abort the evaluation, 
operations have a continuation waiting for their result. It is this continuation 
that the handler captures in the variable k and potentially uses in the handling 
clause. In the clause for Tick, the continuation is resumed by passing it the 
expected unit value, whereas in the clause for Tock, the operation is discarded. 
Thus, if we handle a computation emitting the two operations, it will print out 
“tick” until a first “tock” is printed, after which the evaluation stops. 


2.2 Elaborating Subtyping 


Consider the computation do x +— Tick unit; f x and assume that f has the 
function type Unit — Unit ! {Tock}, taking unit values to unit values and 
perhaps calling Tock operations in the process. The whole computation then 
has the type Unit ! {Tick, Tock} as it returns the unit value and may call Tick 
and Tock. 

The above typing implicitly appeals to subtyping in several places. For 
instance, Tick unit has type Unit ! {Tick} and f x type Unit ! {Tock}. Yet, 
because they are sequenced with do, the type system expects they have the same 
set of effects. The discrepancies are implicitly reconciled by the subtyping which 
admits both {Tick} < {Tick, Tock} and {Tock} < {Tick, Tock}. 

We elaborate the IMPEFF term into the explicitly-typed core language 
EXEFF to make those appeals to subtyping explicit by means of casts with 
coercions: 

do x — ((Tick unit) > y1);(f x) > %2 


A coercion y is a witness for a subtyping A! A < A’! A’ and can be used 
to cast a term c of type A! A to aterm c > y of type A’! A’. In the above 
term, yı and 7 respectively witness Unit ! {Tick} < Unit ! {Tick, Tock} and 
Unit ! {Tock} < Unit ! {Tick, Tock}. 


2.3 Polymorphic Subtyping for Types and Effects 


The above basic example only features monomorphic types and effects. Yet, 
our calculus also supports polymorphism, which makes it considerably more 
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expressive. For instance the type of f in let f = (fun g > g unit) in... is 
generalised to: 


Va, a’.76,0.a< al > 6 < F => (Unit =a! ô) > a! ! 6 


This polymorphic type scheme follows the qualified types convention [9] where 
the type (Unit — a! ô) — a’ ! 6’ is subjected to several qualifiers, in this 
case a < a’ and 6 < 6’. The universal quantifiers on the outside bind the type 
variables a and a’, and the effect set variables 6 and 6’. 

The elaboration of f into EXEFF introduces explicit binders for both the 
quantifiers and the qualifiers, as well as the explicit casts where subtyping is 
used. 


Aa. Aœ Abd. Ad A(w:a < a').A(w’:6 < 6').fun (g:Unit > a!) (gunit) > (w!w’) 


Here the binders for qualifiers introduce coercion variables w between pure types 
and w’ between operation sets, which are then combined into a computation coer- 
cion w ! w’ and used for casting the function application g unit to the expected 
type. 

Suppose that h has type Unit — Unit!{Tick} and fh type 
Unit ! {Tick, Tock}. In the EXEFF calculus the corresponding instantiation of f 
is made explicit through type and coercion applications 


f Unit Unit {Tick} {Tick, Tock} 1 y2 h 


where yı needs to be a witness for Unit < Unit and y2 for {Tick} < 
{Tick, Tock}. 


2.4 Guaranteed Erasure with Skeletons 


One of our main requirements for EXEFF is that its effect information and 
subtyping can be easily erased. The reason is twofold. Firstly, we want to show 
that neither plays a role in the runtime behaviour of EXEFF programs. Secondly 
and more importantly, we want to use a conventionally typed (System F-like) 
functional language as a backend for the Eff compiler. 

At first, erasure of both effect information and subtyping seems easy: simply 
drop that information from types and terms. But by dropping the effect variables 
and subtyping constraints from the type of f, we get Va,a’.(Unit > a) > a’ 
instead of the expected type Va.(Unit — a) — a. In our naive erasure attempt 
we have carelessly discarded the connection between a and a’. A more appro- 
priate approach to erasure would be to unify the types in dropped subtyping 
constraints. However, unifying types may reduce the number of type variables 
when they become instantiated, so corresponding binders need to be dropped, 
greatly complicating the erasure procedure and its meta-theory. 

Fortunately, there is an easier way by tagging all bound type variables with 
skeletons, which are barebone types without effect information. For example, the 
skeleton of a function type A —> B ! A is 1 — 7, where 7, is the skeleton of 
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A and 72 the skeleton of B. In EXEFF every well-formed type has an associated 
skeleton, and any two types A; < Ag share the same skeleton. In particular, 
binders for type variables are explicitly annotated with skeleton variables ¢. For 
instance, the actual type of f is: 


Ys.Y(a : s), (a : s).Y8, f.a < a > ô < F > (Unit Œ> a! 6) >a! F 
The skeleton quantifications and annotations also appear at the term-level: 
As.Ala : s).Ala : S). AS.AF' Alw : a Sa) Alw: 8 S0)... 


Now erasure is really easy: we drop not only effect and subtyping-related term 
formers, but also type binders and application. We do retain skeleton binders and 
applications, which take over the role of (plain) types in the backend language. 
In terms, we replace types by their skeletons. For instance, for f we get: 


As.fun (g : Unit —> ç) — gunit : Vo¢.(Unit > ç) > ç 
Terms 
value v ::= x | unit | fun x> c|h 
handler h ::= {return x > cr, 0p} Tk > Cop,,..., Opp Tk +> Cop, } 
computation c ::= return v | Op v (y.c) | do x + c1; c2 


| handle c with v | vı v2 | let z =v in c 
Types & Constraints 


skeleton 7 ::= ç | Unit | 71 > T2 | 71 > 72 


value type A, B ::= a | Unit | A> C| C3 D 
qualified type K := A | t > K 
polytype S$ ::= K | Ys. S | Va:7.S | V6.S' 
computation type C, D ::= A! A 
dirt A ::= ô | Ø | {0p} U A 


simple constraint m ::= A; < A2 | Aı < 42 
constraint p :=7 | C <D 


Fig. 1. IMPEFF Syntax 


3 The ImpEff Language 


This section presents IMPEFF, a basic functional calculus with support for alge- 
braic effect handlers, which forms the core language of our optimising compiler. 
We describe the relevant concepts, but refer the reader to Pretnar’s tutorial [21], 
which explains essentially the same calculus in more detail. 
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3.1 Syntax 


Figure 1 presents the syntax of the source language. There are two main kinds 
of terms: (pure) values v and (dirty) computations c, which may call effectful 
operations. Handlers h are a subsidiary sort of values. We assume a given set of 
operations Op, such as Get and Put. We abbreviate Op, xk +> cop,,.--, Op, £ k œ> 
Cop, as [Op £ k ++ Coplopco, and write O to denote the set {Op,,...,Op,,}. 

Similarly, we distinguish between two basic sorts of types: the value types 
A, B and the computation types C, D. There are four forms of value types: type 
variables a, function types A — C, handler types C } D and the Unit type. 
Skeletons 7 capture the shape of types, so, by design, their forms are identical. 
The computation type A! A is assigned to a computation returning values of 
type A and potentially calling operations from the dirt set A. A dirt set con- 
tains zero or more operations Op and is terminated either by an empty set or a 
dirt variable 6. Though we use cons-list syntax, the intended semantics of dirt 
sets A is that the order of operations Op is irrelevant. Similarly to all HM-based 
systems, we discriminate between value types (or monotypes) A, qualified types 
K and polytypes (or type schemes) S. (Simple) subtyping constraints m denote 
inequalities between either value types or dirts. We also present the more gen- 
eral form of constraints p that includes inequalities between computation types 
(as we illustrate in Sect.3.2 below, this allows for a single, uniform constraint 
entailment relation). Finally, polytypes consist of zero or more skeleton, type or 
dirt abstractions followed by a qualified type. 


3.2 Typing 


Figure 2 presents the typing rules for values and computations, along with a 
typing-directed elaboration into our target language EXEFF. In order to simplify 
the presentation, in this section we focus exclusively on typing. The parts of the 
rules that concern elaboration are highlighted in gray and are discussed in Sect. 5. 


Values. Typing for values takes the form I F, v : A~ v', and, given a typing 
environment I", checks a value v against a value type A. 7 
Rule TMVAR handles term variables. Given that x has type (V¢.@> 7.V0.7 => 


A), we appropriately instantiate the skeleton (¢), type (@), and dirt (ô) variables, 
and ensure that the instantiated wanted constraints o(m) are satisfied, via side 
condition Ik, 'yio(m). Rule TMCASTV allows casting the type of a value v from 
A to B, if A is a subtype of B (upcasting). As illustrated by Rule TMTMABs, 
we omit freshness conditions by adopting the Barendregt convention [1]. Finally, 
Rule TMHAND gives typing for handlers. It requires that the right-hand sides 
of the return clause and all operation clauses have the same computation type 
(B! A), and that all operations mentioned are part of the top-level signature 
X. The result type takes the form A! AU O > B ! A, capturing the intended 
handler semantics: given a computation of type A ! AU Ø, the handler (a) pro- 
duces a result of type B, (b) handles operations O, and (c) propagates unhandled 
operations A to the output. 


4 We capture all defined operations along with their types in a global signature X. 


Explicit Effect Subtyping 


typing environment I ::= e | T,s|T,œa:T|T,8|T,x: S |T, wir 


Thy v: A~ wv | Values 


(z: Yva: TNs. T >A) Er aE B/a, AS] T ko g(r) 
a TMVAR 
TF, z: o( AE 7 BAY 
Phyv: Av" 
rk GA < B 
= CASTEN TMUNIT 
Thyv: Bw py I F, unit : Unit ~ unit 
T,a:Atee:Cvd T key A : 7E 
TMTMABS 


Tt, (fun z> c): A > C ~ fun (x: T) => c 


Ta: Atego: B! AR Ty A : TRE 


[(0p : Ap > Bop) € 2 Ta: Aok: Bop > B ! A Fe Cop: B! AM Gy | 2 
Ope 


Cres = {return (x: T) > cp, [Oprk+> chlopco} 


TMHAND 
IT F, {return z > cr, [0p z k > coplaco}: A! AUO 3 B! AS Cres 


I Fec: C~ ce | Computations 


T Fav : A> CR 


oi 

E 

C I Fy v2 : A~ v5 

TMCASTC TMTMAPP 
Thee: CE I Fe vi v2: C ~ v v9 


S =a TNS. > A 


ISa: T, 0, E o v: A P.2:St.c: CRK 
TMLET 


I Felet z =v in c: C~ let z = AF. Aa: T.A.A(w:T)v inc 


DH, v: Amv 
TMRETURN 


IT Fe return v : A ! Q@~ return v 


(Op : App > Bop) € X T Fy v: App ~~ v! 
T, y : Bop Fe c: A ! ARE T Fey Bop : TEOS Op € A 


I Fe Op v (y.c): A! A~ Op u (y: Top.c’) 


TMOP 


I Heci: A! Aac Ix: AF c: B! AE, 
TMDo 


I He do x 4+ c1302:B!Awdore c4; C2 


rev: C>3 Dav Thee: Ome 
IT H. handle c with v : D ~ handle c with v' 


TMHANDLE 


Fig. 2. IMPEFF Typing & Elaboration 
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Computations. Typing for computations takes the form I Fe c: C~ c, and, 
given a typing environment J’, checks a computation c against a type C. 

Rule TMCASTC behaves like Rule TMCASTV, but for computation types. 
Rule TMLET handles polymorphic, non-recursive let-bindings. Rule TMRETURN 
handles return v computations. Keyword return effectively lifts a value v of 
type A into a computation of type A ! Ø. Rule TMOP checks operation calls. 
First, we ensure that v has the appropriate type, as specified by the signature of 
Op. Then, the continuation (y.c) is checked. The side condition Op € A ensures 
that the called operation Op is captured in the result type. Rule TMDo handles 
sequencing. Given that cı has type A! A, the pure part of the result of type A 
is bound to term variable x, which is brought in scope for checking c2. As we 
mentioned in Sect. 2, all computations in a do-construct should have the same 
effect set, A. Rule TMHANDLE eliminates handler types, just as Rule TMTMAPP 
eliminates arrow types. 


Constraint Entailment. The specification of constraint entailment takes the 
form I’ k, -p and is presented in Fig.3. Notice that we use p instead of 7, 
which allows us to capture subtyping between two value types, computation 
types or dirts, within the same relation. Subtyping can be established in several 
ways: 

Rule COVAR handles given assumptions. Rules VCOREFL and DCOREFL 
express that subtyping is reflexive, for both value types and dirts. Notice that 
we do not have a rule for the reflexivity of computation types since, as we 
illustrate below, it can be established using the reflexivity of their subparts. 
Rules VCOTRANS, CCOTRANS and DCOTRANS express the transitivity of sub- 
typing for value types, computation types and dirts, respectively. Rule VCOARR 
establishes inequality of arrow types. As usual, the arrow type constructor is 
contravariant in the argument type. Rules VCOARRL and CCoARRR are the 
inversions of Rule VCOARR, allowing us to establish the relation between the 
subparts of the arrow types. Rules VCOHAND, CCOHL, and CCoHR work 
similarly, for handler types. Rule CCOCOmMP captures the covariance of type 
constructor (!), establishing subtyping between two computation types if sub- 
typing is established for their respective subparts. Rules VCOPURE and DCOIM- 
PURE are its inversions. Finally, Rules DCONIL and DCoOpP establish subtyping 
between dirts. Rule DCONIL captures that the empty dirty set Ø is a subdirt 
of any dirt A and Rule DCOOP expresses that dirt subtyping preserved under 
extension with the same operation Op. 


Well-Formedness of Types, Constraints, Dirts, and Skeletons. The rela- 
tions l Ry A:T T and I ky C:7~ C check the well-formedness of value 
and computation types respectively. Similarly, relations l k, p»sp and T R A 
check the well-formedness of constraints and dirts, respectively. 
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I ke ¥:p| Constraint Entailment 


el I kay A: rE 


(wir) vc 
Siea S OREFL 
T ko Bar oyi Tk KA < A 
T ko pA: < A2 
BRA ae JA <A 
— ` D) CORERL L i > VCoTRANS 
rk. (A): A<A Tr k Ya > Y2: Ai < As 
rR Gmc, < Cc, I k 12 Ar < Ae 
Bale gC, <C JE res HA < A 
oe S = CCoTRANS L mi a DCoTRANS 
Pk Giese, < C, T ko Geta A, < Az 
Tis BSA T C<D 
Ba __ VCoARR 
rk Wow: Aor CcC<BOD 
k- gA > C<BSD rk BA CSB>D 
VCOARRL CCOARRR 
Dk, left(y):B <A I ko right(y):C <D 


JP le aC, < C JE Tes gD, <D 
___ S~ ae S a VCoHAND 
T ko fic, > D < C > Da 


rk BMC, 3D,<C,3D rk BEC >D <C 3D 
Vie? Ai SS 7 2 OCoHL eS Se cane, 
I ko left(y): C, < C, I ko right(y):D, < Dy 
ThE <A Th gA: <A 
a S I S CCoComp 


T ko GAETHA: ! Ai < Ag! Ao 


I k. BEA: ! Ai < Ag! Ao 


kes yiAi! Ai < Ag! Ao 
VCOPURE z DCOIMPURE 
I ke pure(y): Ai < Ao I ko impure(y): Ai < Ae 
Te BAA: < A Op: App > Bop) E X 
——_— DCoNIL m z Cae o) DCoOp 
T ke DAR < A T ke {0p} U7 : {0p} U Ai < {0p} U 42 


Fig. 3. IMPEFF Constraint Entailment 


4 The ExEff Language 


4.1 Syntax 


Figure 4 presents EXEFF’s syntax. EXEFF is an intensional type theory akin to 
System F [7], where every term encodes its own typing derivation. In essence, all 
abstractions and applications that are implicit in IMPEFF, are made explicit in 
EXEFF via new syntactic forms. Additionally, EXEFF is impredicative, which is 
reflected in the lack of discrimination between value types, qualified types and 
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Terms 
value v ::= x | unit | fun (x : T) => c|h 
| Ac.vlutr|Aa:rvlv T|Adv[v Al Aw:in)vlyvylyuey 
handler h ::= {return (x: T) +> cr, Op, £ k +> cop,,..., Op, £ k — Cop, } 
computation c ::= return v | Op v (y: T.c) | do x + c1; c2 
| handle c with v | vı v2 | let x =v inc|c> y 
Types 
skeleton 7 ::= ç | Unit | 7 > T2 | T1 Z T2 | Vo.7 
value type T ::= a | Unit | T > C | C, > C, | YS.T | Ya:T.T | Y.T | r > T 


simple coercion type 7 := Tı < Tə | Ai < 42 
coercion type p ::= 7 | C4 < Cy 


computation type C := T! A 
dirt A ::= ô | Ø | {0p} U A 


Coercions 


q = w |71 >| (T) | > y2 | y1 > 72 | left(y) | right(y) | (A) | Oa | {Op} Uy 
| vea | yr] | Yer | fT] ir Ala => 7 | a | 12 ! y2 | pure(y) | impurely) 


Fig. 4. EXEFF Syntax 


type schemes; all non-computation types are denoted by T. While the impred- 
icativity is not strictly required for the purpose at hand, it makes for a cleaner 
system. 


Coercions. Of particular interest is the use of explicit subtyping coercions, 
denoted by y. EXEFF uses these to replace the implicit casts of IMPEFF 
(Rules TMCASTV and TMCAsTC in Fig. 2) with explicit casts (v > y) and 
(c> q). 

Essentially, coercions y are explicit witnesses of subtyping derivations: each 
coercion form corresponds to a subtyping rule. Subtyping forms a partial order, 
which is reflected in coercion forms yı > %2, (T), and (A). Coercion form 
71 > Ye captures transitivity, while forms (J) and (A) capture reflexivity for 
value types and dirts (reflexivity for computation types can be derived from 
these). 

Subtyping for skeleton abstraction, type abstraction, dirt abstraction, and 
qualification is witnessed by forms Vs.y, Va.y, Vô.y, and 7 => y, respectively. 
Similarly, forms [7], y[T], y[A], and 71@y2 witness subtyping of skeleton 
instantiation, type instantiation, dirt instantiation, and coercion application, 
respectively. 

Syntactic forms yı — y2 and yı Ə ye capture injection for the arrow 
and the handler type constructor, respectively. Similarly, inversion forms left(7) 
and right(y) capture projection, following from the injectivity of both type 
constructors. 


Explicit Effect Subtyping 337 


Coercion form 71 ! y2 witnesses subtyping for computation types, using proofs 
for their components. Inversely, syntactic forms pure(y) and impure(7y) witness 
subtyping between the value- and dirt-components of a computation coercion. 

Finally, coercion forms 0, and {Op} U y are concerned with dirt subtyping. 
Form Ĥa witnesses that the empty dirt Ø is a subdirt of any dirt A. Lastly, 
coercion form {Op} U y witnesses that subtyping between dirts is preserved under 
extension with a new operation. Note that we do not have an inversion form to 
extract a witness for A; < A» from a coercion for {Op} U A; < {Op} U Ag. The 
reason is that dirt sets are sets and not inductive structures. For instance, for 
A, = {Op} and A, = 9 the latter subtyping holds, but the former does not. 


4.2 Typing 


Value and Computation Typing. Typing for EXEFF values and computa- 
tions is presented in Figs. 5 and 6 and is given by two mutually recursive relations 
of the form I k v : T (values) and I k c: C (computations). EXEFF typing 
environments J” contain bindings for variables of all sorts: 


Pu=elIys|Lva:7|L@,6| Le: T| Twin 


Typing is entirely syntax-directed. Apart from the typing rules for skeleton, type, 
dirt, and coercion abstraction (and, subsequently, skeleton, type, dirt, and coer- 
cion application), the main difference between typing for IMPEFF and EXEFF 
lies in the explicit cast forms, (v > y) and (c > y). Given that a value v has type 
Tı and that y is a proof that Tı is a subtype of Tə, we can upcast v with an 
explicit cast operation (v > y). Upcasting for computations works analogously. 


w NER Tan Ia eS G Te eae 
Ele IN I R unit : Unit TR (umr: Tee): Tod 


rku: Ta They: TS T5 IS e Ot IR PATRU E 
Ti Uy coli RAG ea Ge TE Aa: tr.v: VATE 


TOU Ds I elms Ah Dino e eee I Te 
FRAS: V6.T PRAG mju: nr T A a I 


IR aeo IR ee IEA 
[((Op: T > T2)E LX Diz: T1,k: Tz TIA eco TIA co 


I RK {return (x: Tr) > cr, [0p £ k > coloco}: Te! AUOST!A 


Tere IE Tin UNA AT I Ta Ue Mozy 
iP eae JRA rea TT PRA 
TASU TERTS] Tieu T T[T2/a] DEUAF TA] 


Fig. 5. EXEFF Value Typing 
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Well-Formedness of Types, Constraints, Dirts and Skeletons. The defi- 
nitions of the judgements that check the well-formedness of EXEFF value types 
(T H} T : T), computation types (Ik C : T), dirts (IT k A), and skeletons 
(Ik, T) are equally straightforward as those for IMPEFF. 


Coercion Typing. Coercion typing formalizes the intuitive interpretation of 
coercions we gave in Sect.4.1 and takes the form Ik, 7: p. It is essentially an 
extension of the constraint entailment relation of Fig. 3. 


4.3 Operational Semantics 


Figure 7 presents selected rules of EXEFF’s small-step, call-by-value operational 
semantics. For lack of space, we omit 8-rules and other common rules and focus 
only on cases of interest. 

Firstly, one of the non-conventional features of our system lies in the strati- 
fication of results in plain results and cast results: 


Pew: ToC TU s:$ JE esa Oe Tor e eC: 
Meu: C mie retr =v ine E 
IEE IN Ter i!A Te Diker DTA 
Ikreturnyv: T!0 Pkdoxe¢ ajo: TLA 


(Op: T> Te) EX FRhu:T Ty IRs eto n fe Ayal Ope A 
PROpv(y: Ta.c): TIA 


DEU C C Gre Gi Tir ee Cr Pe CS CS 
I k handle c with v : C, Jie e O 


Fig. 6. EXEFF Computation Typing 


terminal value v? ::= unit | h | fun 2: T > c | Aa : T.v | Adv | Aw: m.v 
value result v® ::= uT | v7 > y 
computation result c? ::= return v” | (return v”) > y | Op v? (y: T.c) 


Terminal values v? represent conventional values, and value results v? can either 
be plain terminal values vf or terminal values with a cast: v? > y. The same 
applies to computation results c®.5 

Although unusual, this stratification can also be found in Crary’s coercion cal- 
culus for inclusive subtyping [4], and, more recently, in System Fc [25]. Stratifica- 


tion is crucial for ensuring type preservation. Consider for example the expression 


5 Observe that operation values do not feature an outermost cast operation, as the 
coercion can always be pushed into its continuation. 
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(return 5 > (int)! Orop}), of type int! {Op}. We can not reduce the expression 
further without losing effect information; removing the cast would result in com- 
putation (return 5), of type int !Ø. Even if we consider type preservation only 
up to subtyping, the redex may still occur as a subterm in a context that expects 
solely the larger type. 

Secondly, we need to make sure that casts do not stand in the way of eval- 
uation. This is captured in the so-called “push” rules, all of which appear in 
Fig. 7. 

In relation v ~, v’, the first rule groups nested casts into a single cast, by 
means of transitivity. The next three rules capture the essence of push rules: 
whenever a redex is “blocked” due to a cast, we take the coercion apart and 
redistribute it (in a type-preserving manner) over the subterms, so that evalua- 
tion can progress. 

The situation in relation c ~, c’ is quite similar. The first rule uses tran- 
sitivity to group nested casts into a single cast. The second rule is a push rule 
for @-reduction. The third rule pushes a cast out of a return-computation. The 
fourth rule pushes a coercion inside an operation-computation, illustrating why 
the syntax for c? does not require casts on operation-computations. The fifth 
rule is a push rule for sequencing computations and performs two tasks at once. 
Since we know that the computation bound to æ calls no operations, we (a) 
safely “drop” the impure part of y, and (b) substitute x with vf, cast with the 
pure part of y (so that types are preserved). The sixth rule handles operation 
calls in sequencing computations. If an operation is called in a sequencing com- 
putation, evaluation is suspended and the rest of the computation is captured 
in the continuation. 

The last four rules are concerned with effect handling. The first of them 
pushes a coercion on the handler “outwards”, such that the handler can be 
exposed and evaluation is not stuck (similarly to the push rule for term appli- 
cation). The second rule behaves similarly to the push/beta rule for sequencing 
computations. Finally, the last two rules are concerned with handling of opera- 
tions. The first of the two captures cases where the called operation is handled 
by the handler, in which case the respective clause of the handler is called. As 
illustrated by the rule, like Pretnar [20], EXEFF features deep handlers: the 
continuation is also wrapped within a with-handle construct. The last rule cap- 
tures cases where the operation is not covered by the handler and thus remains 
unhandled. 

We have shown that EXEFF is type safe: 


Theorem 1 (Type Safety) 


—~Ifl kv: T then either v is a result value or v ~~», v andl ku’: T. 
- If kc: C then either c is a result computation or c~, cd and rRe: C. 
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v ~, v | Values 


(v7 > y) > y2 my v” e (71 > V2) (v7 > y) T ~y (7 T) > [T] 


(v7 > y) Amy, (vu? A) > yA] (v7 > y) y2 y (v 72) > 702 


c~, c | Computations 


(c? > y1) > y2 e c? > (y >y) (vt D y) v2 ~e (vi (v2 D left(y))) D> right(7) 
T R 
return (v > y) ~e (return v` ) > (7! Øg) 
RE R 
(Opu (y: T.c)) Ey. Op vu (y: Eeey) 
do z + ((return v?) > y); c2 ~e c2[(v” & pure(y))/z] 
do x + Op v? (y : T.c1); c2 ~>. Op v? (y : T.do x + c1; c2) 


handle c with (v > y) ~. (handle (c > left(y)) with v?) > right(y) 


handle ((return v”) > y) with h ~, cr[u™ > pure(y)/a] 


handle (Op v? (y : T.c)) with h ~, cop[v™ /x, (fun (y : T) + handle c with h)/k] 


handle (Op v” (y : T.c)) with h ~. Op v? (y : T.handle c with h) 


Fig. 7. EXEFF Operational Semantics (Selected Rules) 


5 Type Inference and Elaboration 


This section presents the typing-directed elaboration of IMPEFF into EXEFF. 
This elaboration makes all the implicit type and effect information explicit, and 
introduces explicit term-level coercions to witness the use of subtyping. 

After covering the declarative specification of this elaboration, we present a 
constraint-based algorithm to infer IMPEFF types and at the same time elabo- 
rate into EXEFF. This algorithm alternates between two phases: (1) the syntax- 
directed generation of constraints from the IMPEFF term, and (2) solving these 
constraints. 


5.1 Elaboration of ImpEff into ExEff 


The grayed parts of Fig. 2 augment the typing rules for IMPEFF value and compu- 
tation terms with typing-directed elaboration to corresponding EXEFF terms. 
The elaboration is mostly straightforward, mapping every IMPEFF construct 
onto its corresponding EXEFF construct while adding explicit type annotations 
to binders in Rules TMTMABS, TMHANDLER and TMOP. Implicit appeals to 
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subtyping are turned into explicit casts with coercions in Rules TMCASTV and 
TMCastTC. Rule TMLET introduces explicit binders for skeleton, type, and dirt 
variables, as well as for constraints. These last also introduce coercion variables 
w that can be used in casts. The binders are eliminated in rule TMVAR by means 
of explicit application with skeletons, types, dirts and coercions. The coercions 
are produced by the auxiliary judgement Ik, yz, defined in Fig.3, which 
provides a coercion witness for every subtyping proof. 
As a sanity check, we have shown that elaboration preserves types. 


Theorem 2 (Type Preservation) 


- IfI Fv: Av’ then elabp(L)k v : elabs(A). 
- fIr Fee: C~ e then elab (T) kc’: elabo(C). 


Here elab (T), elabs(A) and elabo(C) convert IMPEFF environments and types 
into EXEFF environments and types. 


5.2 Constraint Generation and Elaboration 


Constraint generation with elaboration into EXEFF is presented in Figs. 8 (val- 
ues) and 9 (computations). Before going into the details of each, we first intro- 
duce the three auxiliary constructs they use. 


constraint set P, Q :=e|71=72,P|a:7,P | war, P 
typing environment I ::= e | T, x: S 
substitution o ::= e | o - [r/s] | o - [A/a] | o - [4/8] | o - [y/o] 


At the heart of our algorithm are sets P, containing three different kinds of con- 
straints: (a) skeleton equalities of the form 7, = 72, (b) skeleton constraints of the 
form a: 7, and (c) wanted subtyping constraints of the form w : m. The purpose 
of the first two becomes clear when we discuss constraint solving, in Sect. 5.3. 
Next, typing environments I’ only contain term variable bindings, while other 
variables represent unknowns of their sort and may end up being instantiated 
after constraint solving. Finally, during type inference we compute substitutions 
g, for refining as of yet unknown skeletons, types, dirts, and coercions. The last 
one is essential, since our algorithm simultaneously performs type inference and 
elaboration into EXEFF. 

A substitution ø is a solution of the set P, written as o |= P, if we get 
derivable judgements after applying o to all constraints in P. 


Values. Constraint generation for values takes the form Q; k uv: A | 
Q';o~ v’. It takes as inputs a set of wanted constraints Q, a typing envi- 
ronment J’, and a IMPEFF value v, and produces a value type A, a new set of 
wanted constraints Q’, a substitution g, and a EXEFF value v’. 

Unlike standard HM, our inference algorithm does not keep constraint gen- 
eration and solving separate. Instead, the two are interleaved, as indicated by 
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O;r hv: A| Q';o~ v | Values 


(x: Ysa TNT > ADET a = [c'/s, a /a, 6'/6] 
Q;T «:a(A)| wio(m),a’:a(7T),Qe~ as 


Q 
=a 
€ 


Q; I kK unit : Unit | Q; e ~> unit 


a: s,Q; T, s:ak c: C| 2; oR 


Q;IT k (fun z > c) : o(a) > C | Q';o ~ fun z : ola) = c 


Or: Gr, Q TT : ark cr: By! A, | Qo; or REE o co o 1 
Op, EO: 
(Op, $ A; => Bi) EX 
O64 3 Gi, Qi-1;0 "(0,(L)), £ : Ai, k: Bi — arlo: K cop, : Bop, ! Aop, | Qi; oi Cop, 
QO! = Oin : Sin, Cout : Sout, EBC” (Br) < Gout, BMC” (Ar) < Sout, o” (Bop,) < Gout, 
wa, 20" (Ao) < jae Wee Bi > Gout! Sout S Bi > o” (ai! di) 
WE 2 Qin <o” (or(ar)), W7 25in < Sout UO, Qn 
Cres = {return y : o” (or(ar)) > a” (cly > ws/x] > wi! we 
f [0p; aly a” (Cop, JIL > ws; /k] > ws, lwa] } D (Cain) !w7 > (Aout)! (doue)) 


Op; EO 


QI {return T> Cr, [Op x kh Coplopeo } : Qin ! in > Qout ! Oout | Q'; (o" i Or) ~> Cres 


Fig. 8. Constraint Generation with Elaboration (Values) 


the additional arguments of our relation: (a) constraints Q are passed around 
in a stateful manner (i.e., they are input and output), and (b) substitutions ø 
generated from constraint solving constitute part of the relation output. We dis- 
cuss the reason for this interleaved approach in Sect.5.4; we now focus on the 
algorithm. 

The rules are syntax-directed on the input IMPEFF value. The first rule 
handles term variables x: as usual for constraint-based type inference the rule 
instantiates the polymorphic type (V¢.a@: T.V6.7 => A) of x with fresh variables; 
these are placeholders that are determined during constraint solving. More- 
over, the rule extends the wanted constraints P with 7, appropriately instanti- 
ated. In EXEFF, this corresponds to explicit skeleton, type, dirt, and coercion 
applications. 

More interesting is the third rule, for term abstractions. Like in standard 
Hindley-Damas-Milner [5], it generates a fresh type variable a for the type of 
the abstracted term variable x. In addition, it generates a fresh skeleton variable 
S, to capture the (yet unknown) shape of a. 

As explained in detail in Sect. 5.3, the constraint solver instantiates type vari- 
ables only through their skeletons annotations. Because we want to allow local 
constraint solving for the body c of the term abstraction the opportunity to 
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produce a substitution o that instantiates a, we have to pass in the annota- 
tion constraint a : ¢.° We apply the resulting substitution o to the result type 
o(a) > C." 

Finally, the fourth rule is concerned with handlers. Since it is the most com- 
plex of the rules, we discuss each of its premises separately: 

Firstly, we infer a type B,! A, for the right hand side of the return-clause. 
Since a, is a fresh unification variable, just like for term abstraction we require 
Qr : Sr, for a fresh skeleton variable sp. 

Secondly, we check every operation clause in O in order. For each clause, we 
generate fresh skeleton, type, and dirt variables (Si, a;, and 6;), to account for 
the (yet unknown) result type a;!6; of the continuation k, while inferring type 
Bop, ! Aop, for the right-hand-side Cop,- 

More interesting is the (final) set of wanted constraints Q’. First, we assign 
to the handler the overall type 


Qin ! din > Qout ! Sout 


where Sin, Qin; Oins Sout, Mout, Sout are fresh variables of the respective sorts. In 
turn, we require that (a) the type of the return clause is a subtype of Qout ! dout 
(given by the combination of wı and w2), (b) the right-hand-side type of each 
operation clause is a subtype of the overall result type: o”(Bop,! Aop,) < 
Qout!Oout (witnessed by w3,!wa4,), (c) the actual types of the continuations 
Bi > Qout! Sout in the operation clauses should be subtypes of their assumed 
types B; > o” (a;i! ôi) (witnessed by w5,). (d) the overall argument type Qin is 
a subtype of the assumed type of x: o”(o,(a,)) (witnessed by we), and (e) the 
input dirt set ĝin is a subtype of the resulting dirt set out, extended with the 
handled operations O (witnessed by w7). 

All the aforementioned implicit subtyping relations become explicit in the 
elaborated term Cres, via explicit casts. 


Computations. The judgement Q; r kc: C | Q';o~ c generates constraints 
for computations. 

The first rule handles term applications of the form vı v2. After inferring 
a type for each subterm (A; for vı and Ag for v2), we generate the wanted 
constraint o2(A1) < A2 — a!6, with fresh type and dirt variables a and 6, 
respectively. Associated coercion variable w is then used in the elaborated term 
to explicitly (up)cast vi to the expected type Az > a! 0. 

The third rule handles polymorphic let-bindings. First, we infer a type A 
for v, as well as wanted constraints Q,. Then, we simplify wanted constraints 
Q, by means of function solve (which we explain in detail in Sect. 5.3 below), 
obtaining a substitution o| and a set of residual constraints Q’.. 


6 This hints at why we need to pass constraints in a stateful manner. 
T Though o refers to IMPEFF types, we abuse notation to save clutter and apply it 
directly to EXEFF entities too. 
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Qr ke: C| O';0~c | Computations 


Q; T ku: Ar | Qi; oi ~~ vy Q1;01(L) K v2: Az | Q2; 02 ~ v2 


QO TE U ve: a!d|a:¢,Wi02(A1) < A2 > a! ô, Q2; (02 - 01) > (o2(v}) & w) vs 


O,rkv:A| Ojon 


QO; k return v: A! Ø | Q’; o ~ return v’ 


O;Pkv:AlQ:o1~ v 
solve(e; e; O,) = (o1, 2) split(oy(o1(I’)), Q,,01(A)) = (C, a 7, ô, w 27, Q1) 
Q1;0;(o1(L)), x : VEVa TVET >o, (A) kc: C| Q202% c 
Cres = let x = o2(AF.A@ : T.46.A(w : elab,(7)).v') in c 


Q; I k let z =v in c: C | Q2; (02 -01 +01) ~ Cres 


QT kv: A | Qom Qı;o1(T),y : Bo k c: Az! A2 | Q2; o2 > c 
(Op : Aop > Bop) E X Cres = Op (o2(v') > w) (y : elabs(Bop).c’) 
Q; T k Op v (y : Bop.c) : Az ! {Op} U 42 | Wi02(A1) < Aop, Q2; (02 - 01) Cres 


OE i Ci : Aı!4ı | Q1; 01 ~> ci Q1;01(I),2: Ai k C3: A2! As | Q2; 02 ~ Cy 
Cres = do x + (02(c1) & (a2(A1)) !w1); (c2 > (A2) !w2) 


Q; T kdor cica: A2!6| wI 202(A1) < 6, Wz : Az < 6, Q2; (02-01) > Cres 


QT Rv: A | Qj; Qi;oi(I) k c: Ag! A2 | Q2; o2 c 
Q’ = œ : 9,02 : S2, Wi ` 02(41) < (a1! 51 S a2! 2), w2 : A2 < a1, W3 : A2 < 61, Qe 
Cres = handle (c' > (w2 ! w3)) with (o2(v') > w1) 
Q; I k handle c with v : a2! Ao | Q’; (2-01) Cres 


Fig. 9. Constraint Generation with Elaboration (Computations) 


Generalization of x’s type is performed by auxiliary function split, given by 
the following clause: 
Ş= {ç| (a: s) € O,fa’.a’ ¢ AN (a: s) EQ} 
& = fva(Q) U fla(A) \ foal) Qi = {(w: 7) | (w: T) € Q, fol) Z fo(L)} 
6 = fus(Q) U fus (A) \ fual T) Q2 =Q- Qı 
split(T, Q, A) = (§,a@=T, ô, Q1, Q2) 


In essence, split generates the type (scheme) of x in parts. Additionally, it com- 
putes the subset Qə of the input constraints Q that do not depend on locally- 
bound variables. Such constraints can be floated “upwards”, and are passed as 
input when inferring a type for c. The remainder of the rule is self-explanatory. 

The fourth rule handles operation calls. Observe that in the elaborated term, 
we upcast the inferred type to match the expected type in the signature. 
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The fifth rule handles sequences. The requirement that all computations in 
a do-construct have the same dirt set is expressed in the wanted constraints 
02(A1) < ô and Ag < ô (where ô is a fresh dirt variable; the resulting dirt set), 
witnessed by coercion variables wı and w2. Both coercion variables are used in 
the elaborated term to upcast cı and c2, such that both draw effects from the 
same dirt set ô. 

Finally, the sixth rule is concerned with effect handling. After inferring type 
A, for the handler v, we require that it takes the form of a handler type, witnessed 
by coercion variable wı : o2(A1) < (aı!ĝı > ae! de), for fresh ay, a2, 01, do. 
To ensure that the type Az! A> of c matches the expected type, we require that 
Ag! Ag < qı !ôı. Our syntax does not include coercion variables for computation 
subtyping; we achieve the same effect by combining wz : Az < ay and w3 : Ag < 44. 


Theorem 3 (Soundness of Inference). Ife; v: A|Q;a~v' then for 
any d = Q, we have (o’-a)(L) Fy v : o'(A)~ o'(v'), and analogously for 
computations. 


Theorem 4 (Completeness of Inference). If r F, v: A~ v’ then we have 
e lkv: A | Q0~0v" and there exists o' = Q and y, such that o'(v") = v' 
and o(I) k, '"yia'(A’) < A. An analogous statement holds for computations. 


5.3 Constraint Solving 


The second phase of our inference-and-elaboration algorithm is the constraint 
solver. It is defined by the solve function signature: 


solve(o; P; Q) = (o, P’) 


It takes three inputs: the substitution ø accumulated so far, a list of already 
processed constraints P, and a queue of still to be processed constraints Q. There 
are two outputs: the substitution o’ that solves the constraints and the residual 
constraints P’. The substitutions o and o’ contain four kinds of mappings: ¢ > T, 
at> Å, ô œ A and w —> y which instantiate respectively skeleton variables, type 
variables, dirt variables and coercion variables. 


Theorem 5 (Correctness of Solving). For any set Q, the call solve(e;e; Q) 
either results in a failure, in which case Q has no solutions, or returns (o, P) 
such that for any o’ = Q, there exists o” = P such that o' = o" -o. 


The solver is invoked with solve(e; e; Q), to process the constraints Q gen- 
erated in the first phase of the algorithm, i.e., with an empty substitution and 
no processed constraints. The solve function is defined by case analysis on the 
queue. 


Empty Queue. When the queue is empty, all constraints have been processed. 
What remains are the residual constraints and the solving substitution o, which 
are both returned as the result of the solver. 


346 A. H. Saleh et al. 


solve(o; P; e) = (a, P) 


Skeleton Equalities. The next set of cases we consider are those where the 
queue is non-empty and its first element is an equality between skeletons 7, = 72. 
We consider seven possible cases based on the structure of 7, and 72 that together 
essentially implement conventional unification as used in Hindley-Milner type 
inference [5]. 


solve(o; P; T = m, Q) = 
match 7; = T2 with 
lo = çs | solve(c; P; Q) 
lc=THr if ç ¢ fu(T) then let o’ = [r/c] in solve(o’- o; e;0'(O,P)) else fail 
IT =S if ç ¢ fu (T) then let o’ = [r/c] in solve(o’- o; e;0'(O,P)) else fail 
| Unit = Unit ++ solve(c; P; Q) 


I(T T2) = (T3 Ta) +> solve(o; P; Ti = T3, Ta = Ta, Q) 


I(T T2) = (T3 Ta) > solve(o; P; Ti = Ta, T2 = Ta, Q) 


| otherwise +> fail 


The first case applies when both skeletons are the same type variable ç. 
Then the equality trivially holds. Hence we drop it and proceed with solving the 
remaining constraints. The next two cases apply when either 7, or T2 is a skeleton 
variable ¢. If the occurs check fails, there is no finite solution and the algorithm 
signals failure. Otherwise, the constraint is solved by instantiating the ¢. This 
additional substitution is accumulated and applied to all other constraints P, Q. 
Because the substitution might have modified some of the already processed 
constraints P, we have to revisit them. Hence, they are all pushed back onto the 
queue, which is processed recursively. 

The next three cases consider three different ways in which the two skeletons 
can have the same instantiated top-level structure. In those cases the equality is 
decomposed into equalities on the subterms, which are pushed onto the queue 
and processed recursively. 

The last catch-all case deals with all ways in which the two skeletons can be 
instantiated to different structures. Then there is no solution. 


Skeleton Annotations. The next four cases consider a skeleton annotation 
a: T at the head of the queue, and propagate the skeleton instantiation to 
the type variable. The first case, where the skeleton is a variable ¢, has noth- 
ing to do, moves the annotation to the processed constraints and proceeds with 
the remainder of the queue. In the other three cases, the skeleton is instanti- 
ated and the solver instantiates the type variable with the corresponding struc- 
ture, introducing fresh variables for any subterms. The instantiating substitution 
is accumulated and applied to the remaining constraints, which are processed 
recursively. 
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solve(o; P; a: T7, Q) = 
match T with 


| s = solve(o; P,a: T; Q) 


| Unit > let o’ = [Unit/a] in solve(o’ - o; e; o' (Q, P)) 
| 71 T2 ++ let o’ = [(aj! ay? !8)/a] in solve(o"-o; è; a1 : 71,02 : T2, 0 (Q, P)) 
lTi > 72 let o' = [(a]! 16, S a5? !52)/a] in solve(o'-o; è; ay : T1, 02 : T2, o' (Q, P)) 


Value Type Subtyping. Next are the cases where a subtyping constraint 
between two value types A, < Ao, with as evidence the coercion variable w, 
is at the head of the queue. We consider six different situations. 


solve(a; P; w : Ai < A2, Q) = 

match A; < Ag with 

| A< At let T = elabs(A) in solve([(T)/w]-o; P; Q) 

la" < At let m = skeleton(A) in solve(o; P,w :a™ < A; T = 72, Q) 

| A<a™ + let T = skeleton(A) in solve(o; P,w:A<a™; n =7,Q) 

I(A1 > Bı! Ai) < (A2 > B2! 42) > let o! = [(w1 wa! ws)/w] in 
solve(a’ -0; P; wi: Ao < A1, w2 : Bi S Bo,w3: Ai < 42, Q) 

I(41!4ı > Ag! Aa) < (A3! 43 S Ag! 414) = let o’ = |(w1ı !w2 S w3 !w4)/w] in 
solve(o’-o; P; wi: A3 < A1, w2 : A3 < A1, w3 : Ae < A4, w4 : Ao < Aa, Q) 


| otherwise +> fail 


If the two types are equal, the subtyping holds trivially through reflexivity. The 
solver thus drops the constraint and instantiates w with the reflexivity coercion 
(T). Note that each coercion variable only appears in one constraint. So we only 
accumulate the substitution and do not have to apply it to the other constraints. 
In the next two cases, one of the two types is a type variable a. Then we move 
the constraint to the processed set. We also add an equality constraint between 
the skeletons® to the queue. This enforces the invariant that only types with the 
same skeleton are compared. Through the skeleton equality the type structure 
(if any) from the type is also transferred to the type variable. The next two 
cases concern two types with the same top-level instantiation. The solver then 
decomposes the constraint into constraints on the corresponding subterms and 
appropriately relates the evidence of the old constraint to the new ones. The final 
case catches all situations where the two types are instantiated with a different 
structure and thus there is no solution. 

Auxiliary function skeleton(A) computes the skeleton of A. 


Dirt Subtyping. The final six cases deal with subtyping constraints between 
dirts. 


8 We implicitly annotate every type variable with its skeleton: a”. 
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solve(o; P;w: A < A’,Q)= 
match A < A’ with 
1\OUb< O'U m if O FO then let ao’ = [((O\O’) U5") /5',O Uw! /w] in 
solve(a’-a; e;(w’: 5 < o'(4')),o' (Q, P)) 
else solve(o; P, (w: A< A’); Q) 

10 < A’ + solve([OA//w]-o; P; Q) 
18 <Ø let o’ = [0/6; 09/w] in solve(c’-o; e; o/(O,P)) 
OUS s O'R 

if O C O' then let o’ = [O0 U w' /w] in solve(o' - o; P, (w : 8 < O'); Q) else fail 
10 < O'm if OCO then let o' = [0U fono/w] in solve(c’- o; P; Q) else fail 
| O < Oi U oe — let o’ = [((O\O’) U O Oi W Bro o)yus /w] in 

solve(a’-o; e; o/(O,P)) 


If the two dirts are of the general form O U ô and O’ U 0’, we distinguish 
two subcases. Firstly, if O is empty, there is nothing to be done and we move 
the constraint to the processed set. Secondly, if O is non-empty, we partially 
instantiate 6’ with any of the operations that appear in O but not in O’. We 
then drop O from the constraint, and, after substitution, proceed with processing 
all constraints. For instance, for {Op,}U6 < {0p} U 0’, we instantiate 6’ to 
{Op, } U 6”—where 6” is a fresh dirt variable—and proceed with the simplified 
constraint ô < {Op,,O0p,} U6”. Note that due to the set semantics of dirts, it 
is not valid to simplify the above constraint to ô < {Op,} U6”. After all the 
substitution [6 +> {Op,}, 6” — Ø] solves the former and the original constraint, 
but not the latter. 

The second case, Ø < A’, always holds and is discharged by instantiating w 
to Oa’. The third case, 6 < Ø, has only one solution: 6 + Ø with coercion Qg. 
The fourth case, O U ô < O’, has as many solutions as there are subsets of O’, 
provided that © C O’. We then simplify the constraint to 6 < O’, which we move 
to the set of processed constraints. The fifth case, O < O’, holds iff O C O. 
The last case, O < O'U ð’, is like the first, but without a dirt variable in the 
left-hand side. We can satisfy it in a similar fashion, by partially instantiating 
6’ with (O \ O’) U 6”—where 6” is a fresh dirt variable. Now the constraint is 
satisfied and can be discarded. 


Terms 
value v ::= z | unit | h | fun (x : T) => c| As.v|vrT 
handler h ::= {return (a: 7) ++ Cr; 0p; £k cop,,..., 0p, £k > Cop, } 
computation c ::= v1 v2 | let x =v in c | return v | Op v (y : T.c) 
| do x + c1;c2 | handle c with v 
Types type T ::= Ș | 71 > T2 | T1 Z T | Unit | Vo.7 


Fig. 10. SKELEFF Syntax 
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5.4 Discussion 


At first glance, the constraint generation algorithm of Sect. 5.2 might seem need- 
lessly complex, due to eager constraint solving for let-generalization. Yet, we 
want to generalize at local let-bound values over both type and skeleton vari- 
ables,? which means that we must solve all equations between skeletons before 
generalizing. In turn, since skeleton constraints are generated when solving sub- 
typing constraints (Sect. 5.3), all skeleton annotations should be available during 
constraint solving. This can not be achieved unless the generated constraints are 
propagated statefully. 


6 Erasure of Effect Information from ExEfft 


6.1 The SkelEff Language 


The target of the erasure is SKELEFF, which is essentially a copy of EXEFF 
from which all effect information A, type information T and coercions y have 
been removed. Instead, skeletons 7 play the role of plain types. Thus, SKELEFF 
is essentially System F extended with term-level (but not type-level) support for 
algebraic effects. Figure 10 defines the syntax of SKELEFF. The type system and 
operational semantics of SKELEFF follow from those of EXEFF. 


Discussion. The main point of SKELEFF is to show that we can erase the effects 
and subtyping from EXEFF to obtain types that are compatible with a System 
F-like language. At the term-level SKELEFF also resembles a subset of Multicore 
OCaml [6], which provides native support for algebraic effects and handlers but 
features no explicit polymorphism. Moreover, SKELEFF can also serve as a stag- 
ing area for further elaboration into System F-like languages without support for 
algebraic effects and handlers (e.g., Haskell or regular OCaml). In those cases, 
computation terms can be compiled to one of the known encodings in the litera- 
ture, such as a free monad representation [10,22], with delimited control [11], or 
using continuation-passing style [13], while values can typically be carried over 
as they are. 


6.2 Erasure 


Figure 11 defines erasure functions €?(v), eg(c), e% (T), e@(C) and ef (I) for 
values, computations, value types, computation types, and type environments 
respectively. All five functions take a substitution o from the free type variables 
a to their skeleton 7 as an additional parameter. 

Thanks to the skeleton-based design of EXEFF, erasure is straightforward. 
All types are erased to their skeletons, dropping quantifiers for type variables 
and all occurrences of dirt sets. Moreover, coercions are dropped from values 


° As will become apparent in Sect. 6, if we only generalize at the top over skeleton 
variables, the erasure does not yield local polymorphism. 
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ey (x) =z Ev (4. v) = = €y aW) 
es (unit) = unit eE(Alu n) u) = e (v) 
e (v > y) = & (v) ev (v T) = eg (v) T 
(fun (0: T) > c) = fun (z : €&(T)) = e(o) (v T) = Elo) 
ef (As.v) = As-e (v) el(u A) = elo) 
€2(A(a: 7).v) = g 1°77} w) ev (u 7) = ev (v) 
ef ({return (x: T) +> cr, [0p z k > coplopco}) = 
{return (x : e¥(T)) + el (cr), [0px k + €2 (cop) |opeo} 
ec (vı v2) = eg (v1) ey (v2) 
e (let r =v inc) ~ let r= e (v) ine (c) 
cc (return v) = return (e (v)) 
ec (Op v (y : T.c)) = Op (e (v)) (y : 0 (T)-€c(c)) 
eZ (do £ + c1; c2) = r x + €¢ (cr); €¢ (ca) 
iene c with v) = handle €2(c) with eṣ (v) 
ec (c D> q) = ee (c) 
eyla) = o(a) eelT! A) = (T) 
ey(T > C) = &(T) =O) 
ev (Cy > Cy) = (C) > lC) eple) = € 
cy (Unit) = Unit ep(I,s) = ep(L),¢ 
eya => T) = (T) G(T, a: T) = e Hr) 
ey (Ys. T) = Ys. (T) ep(l, 5) = e(r) 
SV =e, CT) egl, x : T) = &(L),2: e&(T) 
ey (Yô. T) = eœ (T) ep(I,w: 7) = elI) 


Fig. 11. Definition of type erasure. 


and computations. Finally, all binders and elimination forms for type variables, 
dirt set variables and coercions are dropped from values and type environments. 
The expected theorems hold. Firstly, types are preserved by erasure. !° 


Theorem 6 (Type as acca EE Kv: T theme Pie (v): ET). 
IFT Rc: C then & (T) kh. E (2286). 


Here we abuse of notation and use I as a substitution from type variables to 
skeletons used by the erasure functions. 
Finally, we have that erasure preserves the operational semantics. 


Theorem 7 (Semantic Preservation). If v ~, v’ then g (v) => e2(v’). If 
c~, C then (c) =O (c). 

In both cases, =° denotes the congruence closure of the step relation in SKEL- 
EFF. The choice of substitution ø does not matter as types do not affect the 
behaviour. 


1o Typing for SKELEFF values and computations take the form I kv v : Tand I kec:T. 
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Discussion. Typically, when type information is erased from call-by-value lan- 
guages, type binders are erased by replacing them with other (dummy) binders. 
For instance, the expected definition of erasure would be: 

Ee (Ala : T).v) = A(z : Unit).e2(v) 
This replacement is motivated by a desire to preserve the behaviour of the 
typed terms. By dropping binders, values might be turned into computations 
that trigger their side-effects immediately, rather than at the later point where 
the original binder was eliminated. However, there is no call for this circum- 
spect approach in our setting, as our grammatical partition of terms in values 
(without side-effects) and computations (with side-effects) guarantees that this 
problem cannot happen when we erase values to values and computations to 
computations. 


7 Related Work and Conclusion 


Eff’s Implicit Type System. The most closely related work is that of Pretnar 
[20] on inferring algebraic effects for Eff, which is the basis for our implicitly- 
typed IMPEFF calculus, its type system and the type inference algorithm. There 
are three major differences with Pretnar’s inference algorithm. 

Firstly, our work introduces an explicitly-typed calculus. For this reason we 
have extended the constraint generation phase with the elaboration into EXEFF 
and the constraint solving phase with the construction of coercions. 

Secondly, we add skeletons to guarantee erasure. Skeletons also allow us to 
use standard occurs-check during unification. In contrast, unification in Pretnar’s 
algorithm is inspired by Simonet [24] and performs the occurs-check up to the 
equivalence closure of the subtyping relation. In order to maintain invariants, 
all variables in an equivalence class (also called a skeleton) must be instantiated 
simultaneously, whereas we can process one constraint at a time. As these classes 
turn out to be surrogates for the underlying skeleton types, we have decided to 
keep the name. 

Finally, Pretnar incorporates garbage collection of constraints [19]. The aim 
of this approach is to obtain unique and simple type schemes by eliminating 
redundant constraints. Garbage collection is not suitable for our use as type vari- 
ables and coercions witnessing subtyping constraints cannot simply be dropped, 
but must be instantiated in a suitable manner, which cannot be done in general. 

Consider for instance a situation with type variables a , a2, a3, a4, and 
Q5 where ay < a3, Q2 < a3, a3 < a4, and a3 < as. Suppose that a3 does 
not appear in the type. Then garbage collection would eliminate it and replace 
the constraints by a, < a4, a2 < a4, a1 < a5, and ag < as. While garbage 
collection guarantees that for any ground instantiation of the remaining type 
variables, there exists a valid ground instantiation for ag, EXEFF would need 
to be extended with joins (or meets) to express a generically valid instantiation 
like œı U ag. Moreover, we would need additional coercion formers to establish 
ay < (ay Uag) or (a1; U a2) < a4. 
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As these additional constructs considerably complicate the calculus, we pro- 
pose a simpler solution. We use EXEFF as it is for internal purposes, but display 
types to programmers in their garbage-collected form. 


Calculi with Explicit Coercions. The notion of explicit coercions is not new; 
Mitchell [15] introduced the idea of inserting coercions during type inference for 
ML-based languages, as a means for explicit casting between different numeric 
types. 

Breazu-Tannen et al. [3] also present a translation of languages with inher- 
itance polymorphism into System F, extended with coercions. Although their 
coercion combinators are very similar to our coercion forms, they do not include 
inversion forms, which are crucial for the proof of type safety for our system. 
Moreover, Breazu-Tannen et al.’s coercions are terms, and thus can not be erased. 

Much closer to EXEFF is Crary’s coercion calculus for inclusive subtyping [4], 
from which we borrowed the stratification of value results. Crary’s system sup- 
ports neither coercion abstraction nor coercion inversion forms. 

System Fc [25] uses explicit type-equality coercions to encode complex lan- 
guage features (e.g. GADTs [16] or type families [23]). Though EXEFF’s coer- 
cions are proofs of subtyping rather than type equality, our system has a lot in 
common with it, including the inversion coercion forms and the “push” rules. 


Future Work. Our plans focus on resuming the postponed work on efficient 
compilation of handlers. First, we intend to adjust program transformations to 
the explicit type information. We hope that this will not only make the optimizer 
more robust, but also expose new optimization opportunities. Next, we plan to 
write compilers to both Multicore OCaml and standard OCaml, though for the 
latter, we must first adapt the notion of erasure to a target calculus without 
algebraic effect handlers. Finally, once the compiler shows promising preliminary 
results, we plan to extend it to other Eff features such as user-defined types or 
recursion, allowing us to benchmark it on more realistic programs. 
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Abstract. We present SLR, the first expressive program logic for rea- 
soning about concurrent programs under a weak memory model address- 
ing the out-of-thin-air problem. Our logic includes the standard features 
from existing logics, such as RSL and GPS, that were previously known 
to be sound only under stronger memory models: (1) separation, (2) 
per-location invariants, and (3) ownership transfer via release-acquire 
synchronisation—as well as novel features for reasoning about (4) the 
absence of out-of-thin-air behaviours and (5) coherence. The logic is 
proved sound over the recent “promising” memory model of Kang et al., 
using a substantially different argument to soundness proofs of logics for 
simpler memory models. 


1 Introduction 


Recent years have seen the emergence of several program logics [2,6,8, 16,23, 24, 
26-28] for reasoning about programs under weak memory models. These pro- 
gram logics are valuable tools for structuring program correctness proofs, and 
enabling programmers to reason about the correctness of their programs with- 
out necessarily knowing the formal semantics of the programming language. So 
far, however, they have only been applied to relatively strong memory models 
(such as TSO [19] or release/acquire consistency [15] that can be expressed as a 
constraint on individual candidate program executions) and provide little to no 
reasoning principles to deal with C/C++ “relaxed” accesses. 

The main reason for this gap is that the behaviour of relaxed accesses is noto- 
riously hard to specify [3,5]. Up until recently, memory models have either been 
too strong (e.g., [5,14,17]), forbidding some behaviours observed with modern 
hardware and compilers, or they have been too weak (e.g., [4]), allowing so-called 
out-of-thin-air (OOTA) behaviour even though it does not occur in practice and 
is highly problematic. 

One observable behaviour forbidden by the strong models is the load buffer- 
ing behaviour illustrated by the example below, which, when started with both 
locations x and y containing 0, can end with both rı and rə containing 1. 
© The Author(s) 2018 
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This behaviour is observable on certain ARMv7 processors after the compiler 
optimises r2 + 1 — rə to 1. 


r1 := [elas // reads 1 
[y]rix t= rı 


T2 := [y lizi // reads 1 


sae (LB+data+fakedep) 
rlx «= 12 = 


However, one OOTA behaviour they should not allow is the following example 
by Boehm and Demsky [5]. When started with two completely disjoint lists a and 
b, by updating them separately in parallel, it should not be allowed to end with 
a and b pointing to each other, as that would violate physical separation (for 
simplicity, in these lists, a location just holds the address of the next element): 


rı := [a]; // reads b 


mae S Plas ae Tea (Disjoint-Lists) 
ljpiz:= 


hala := b 


Because of this specification gap, program logics either do not reason about 
relaxed accesses, or they assume overly strengthened models that disallow some 
behaviours that occur in practice (as discussed in Sect. 5). 

Recently, there have been several proposals of programming language mem- 
ory models that allow load buffering behaviour, but forbid obvious out-of-thin-air 
behaviours [10,13,20]. This development has enabled us to develop a program 
logic that provides expressive reasoning principles for relaxed accesses, without 
relying on overly strong models. 

In this paper, we present SLR, a separation logic based on RSL [27], extended 
with strong reasoning principles for relaxed accesses, which we prove sound over 
the recent “promising” semantics of Kang et al. [13]. SLR features per-location 
invariants [27] and physical separation [22], as well as novel assertions that we use 
to show the absence of OOTA behaviours and to reason about various coherence 
examples. (Coherence is a property of memory models that requires the existence 
of a per-location total order on writes that reads respect.) 

There are two main contributions of this work. 

First, SLR is the first logic which can prove absence of OOTA in all the 
standard litmus tests. As such, it provides more evidence to the claim that the 
promising semantics solves the out-of-thin-air problem in a satisfactory way. 
The paper that introduced the promising semantics [13] comes with three DRF 
theorems and a simplistic value logic. These reasoning principles are enough to 
show absence of some simple out-of-thin-air behaviours, but it is still very easy 
to end up beyond the reasoning power of these two techniques. For instance, 
they cannot be used to prove that rı = 0 in the following “random number 
generator” litmus test!, where both the x and y locations initially hold 0. 


r2 = [Yrs (RNG) 


ri := lelas; 
[t]rix := r2 


[ylrix := rı + 1 


The subtlety of this litmus test is the following: if the first thread reads a certain 
value v from «, then it writes v + 1 to y, which the second thread can read, and 


1 The litmus test is called this way because some early attempts to solve the OOTA 
problem allowed this example to return arbitrary values for x and y. 
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write to x; this, however, does not enable the first thread to read v + 1. SLR 
features novel assertions that allow it to handle those and other examples, as 
shown in the following section. 

The second major contribution is the proof of soundness of SLR over the 
promising semantics [13]?. The promising semantics is an operational model 
that represents memory as a collection of timestamped write messages. Besides 
the usual steps that execute the next command of a thread, the model has a non- 
standard step that allows a thread to promise to perform a write in the future, 
provided that it can guarantee to be able to fulfil its promise. After a write is 
promised, other threads may read from that write as if it had already happened. 
Promises allow the load-store reordering needed to exhibit the load buffering 
behaviour above, and yet seem, from a series of litmus tests, constrained enough 
so as to not introduce out-of-thin-air behaviour. 

Since the promising model is rather different from all other (operational and 
axiomatic) memory models for which a program logic has been developed, none 
of the existing approaches for proving soundness of concurrent program logics 
are applicable to our setting. Two key difficulties in the soundness proof come 
from dealing with promise steps. 


1. Promises are very non-modular, as they can occur at every execution point 
and can affect locations that may only be accessed much later in the program. 

2. Since promised writes can be immediately read by other threads, the sound- 
ness proof has to impose the same invariants on promised writes as the ones 
it imposes on ordinary writes (e.g., that only values satisfying the location’s 
protocol are written). In a logic supporting ownership transfer,*? however, 
establishing those invariants is challenging, because a thread may promise to 
write to x even without having permission to write to x. 


To deal with the first challenge, our proof decouples promising steps from ordi- 
nary execution steps. We define two semantics of Hoare triples—one “promis- 
ing”, with respect to the full promising semantics, and one “non-promising” , 
with respect to the promising semantics without promising steps—and prove 
that every Hoare triple that is correct with respect to its non-promising inter- 
pretation is also correct with respect to its promising interpretation. This way, we 
modularise reasoning about promise steps. Even in the non-promising semantics, 
however, we do allow threads to have outstanding promises. The main difference 
in the non-promising semantics is that threads are not allowed to issue new 
promises. 

To resolve the second challenge, we observe that in programs verified by SLR, 
a thread may promise to write to x only if it is able to acquire the necessary 
write permission before performing the actual write. This follows from promise 


2 As the promising semantics comes with formal proofs of correctness of all the 
expected local program transformations and of compilation schemes to the x86-TSO, 
Power, and ARMv8-POP architectures [21], SLR is sound for these architectures too. 

3 Supporting ownership transfer is necessary to provide useful rules for C11 release 
and acquire accesses. 
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e € Erpr::=n integer s E€ Stm ::= skip | s1; s2 | if e then sı else s2 
|r register | while e do s | r := e | r := [e] a 
| e1 op e2 arithmetic [m= [elaca | [erJeix := e2 | [e1]re1 := e2 


Fig. 1. Syntax of the programming language. 


certification: the promising semantics requires all promises to be certifiable; that 
is, for every state of the promising machine, there must exist a non-promising 
execution of the machine that fulfils all outstanding promises. 

We present the SLR assertions and rules informally in Sect. 2. We then give 
an overview of the promising semantics of Kang et al. [13] in Sect. 3, and use it 
in Sect. 4 to explain the proof of soundness of SLR. We discuss related work in 
Sect.5. Details of the rules of SLR and its soundness proof can be found in our 
technical appendix [1]. 


2 Our Logic 


The novelty of our program logic is to allow non-trivial reasoning about relaxed 
accesses. Unlike release/acquire accesses, relaxed accesses do not induce syn- 
chronisation between threads, so the usual approach of program logics, which 
relies on ownership transfer, does not apply. Therefore, in addition to reasoning 
about ownership transfer like a standard separation logic, our logic supports rea- 
soning about relaxed accesses by collecting information about what reads have 
been observed, and in which order. When combined with information about 
which writes have been performed, we can deduce that certain executions are 
impossible. 

For concreteness, we consider a minimal “WHILE” programming language 
with expressions, e € Expr, and statements, s E Stm, whose syntax is given in 
Fig. 1. Besides local register assignments, statements also include memory reads 
with relaxed or acquire mode, and memory writes with relaxed or release mode. 


2.1 The Assertions of the Logic 


The SLR assertion language is generated by the following grammar, where N, 
l, v, t, t and X all range over a simply-typed term language which we assume 
includes booleans, locations, values and expressions of the programming lan- 
guage, fractional permissions, and timestamps, and is closed under pairing, finite 
sets, and sequences. By convention, we assume that l, v, t, 7 and X range over 
terms of type location, value, timestamp, permission and sets of pairs of values, 
and timestamps, respectively. 


P,Q € Assn:= L| T| PVQ| PAQ] P => Q |VYz.P | 3z. P | Ni = Nə | (N) 
| P* Q | Rel, $) | Acal, 6) | OW, v, t) | W70, X) | VP 


ob € Pred ::= Ax. P 
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The grammar contains the standard operators from first order logic and separa- 
tion logic, the Rel and Acq assertions from RSL [27], and a few novel constructs. 

Rel(/, ¢) grants permission to perform a release write to location l and transfer 
away the invariant ¢(v), where v is the value written to that location. Conversely, 
Acq(l, ¢) grants permission to perform an acquire read from location l and gain 
access to the invariant @(v), where v is the value returned by the read. 

The first novel assertion form, O(l, v, t), records the fact that location | was 
observed to have value v at timestamp t. The timestamp is used to order it with 
other reads from the same location. The information this assertion provides is 
very weak: it merely says that the owner of the assertion has observed that value, 
it does not imply that any other thread has ever observed it. 

The other novel assertion form, W7” (l, X), asserts ownership of location | 
and records a set of writes X to that location. The fractional permission m € 
Q indicates whether ownership is shared or exclusive. Full permission, 7 = 1, 
confers exclusive ownership of location l and ensures that X is the set of all 
writes to location l; any fraction, 0 < m < 1, confers shared ownership and 
enforces that X is a lower-bound on the set of writes to location l. The order 
of writes to l is tracked through timestamps; the set X is thus a set of pairs 
consisting of the value and the timestamp of the write. 

In examples where we only need to refer to the order of writes and not the 
exact timestamps, we write W7 (x, £), where £ = [v1,..., Un] is a list of values, as 
shorthand for 3t1,...,tn.t1 > tg > ++: > tn x W7 (a, {(v1, t1), (Un, tn) }). The 
W7 (x, 2) assertion thus expresses ownership of location x with permission 7, and 
that the writes to x are given by the list Z in order, with the most recent write 
at the front of the list. 


Relation Between Reads and Writes. Records of reads and writes can be confronted 
by the thread owning the exclusive write assertion: all reads must have read values 
that were written. This is captured formally by the following property: 


W! (x, X) * O(a, a,t) > W! (x, X) x O(a, a,t) * (a,t) € X (Reads-from-Write) 


Random Number Generator. These assertions allow us to reason about the “ran- 
dom number generator” litmus test from the Introduction, and to show that it 
cannot read arbitrarily large values. As discussed in the Introduction, capturing 
the set of values that are written to x, as made possible by the “invariant-based 
program logic” of Kang et al. [13, Sect. 5.5] and of Jeffrey and Riley [10, Sect. 
6], is not enough, and we make use of our stronger reasoning principles. We use 
O(x,a,t) to record what values reads read from each location, and W! (x, £) to 
record what sequences of values were written to each location, and then confront 
these records at the end of the execution. The proof sketch is then as follows: 


oe ee 

T1 := |T rlx’ T2 := |Y rlx’ 

TW (o (0) Oln et {W1 (x, [0]) * Olyra) *...} 
[ylux = rı +1 [t]rix = r2 


{W1(y, [ry + 1;0]) * O(a, r1,-)*...} |] {W1(a, [r2;0]) * O(y,r2,-) *...} 
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At the end of the execution, we are able to draw conclusions about the values 
of the registers. From W! (g, [r2;0]) and O(z,r1,-), we know that rı € {r2,0} by 
rule Reads-from-Write. Similarly, we know that rə € {r1 + 1,0}, and so we can 
conclude that rı = 0. We discuss the distribution of resources at the beginning 
of a program, and their collection at the end of a program, in Theorem 2. Note 
that we are unable to establish what values the reads read before the end of the 
litmus test. Indeed, before the end of the execution, nothing enforces that there 
are no further writes that reads could read from. 


2.2 The Rules of the Logic for Relaxed Accesses 


We now introduce the rules of our logic by focusing on the rules for relaxed 
accesses. In addition, we support the standard rules from separation logic and 
Hoare logic, rules for release/acquire accesses (Sect. 2.4), and the following con- 
sequence rule: 

PSP’ {P3ciQ} >Q 

- {P}c{Q} 

which allows one to use “view shifting” implications to strengthen the precon- 
dition and weaken the postcondition. 

The rules for relaxed accesses are adapted from the rules of RSL [27] for 
release /acquire accesses, but use our novel resources to track the more subtle 
behaviour of relaxed accesses. Since relaxed accesses do not introduce synchro- 
nisation, they cannot be used to transfer ownership; they can, however, be used 
to transfer information. For this reason, as in RSL [27], we associate a predicate 
@ on values to a location x using paired Rel(z,¢) and Acq(x, ¢) resources, for 
writers and readers, respectively. To write v to x, a writer has to provide ¢(v), 
and in exchange, when reading v from x, a reader obtains ¢(v). However, here, 
relaxed writes can only send pure predicates (i.e., ones which do not assert own- 
ership of any resources), and relaxed reads can only obtain the assertion from 
the predicate guarded by a modality V^ that only pure assertions filter through: 
if P is pure, then VP => P. All assertions expressible in first-order logic are 
pure. 


(CONSEQ) 


Relaxed Write Rule. To write value v (to which the value expression e2 evalu- 
ates) to location x (to which the location expression e evaluates), the thread 
needs to own a write permission W” (x, X). Moreover, it needs to provide ¢(v), 
the assertion associated to the written value, v, to location x by the Rel(z, ¢) 
assertion. Because the write is a relaxed write, and therefore does not induce 
synchronisation, (v) has to be a pure predicate. The write rule updates the 
record of writes with the value written, timestamped with a timestamp newer 
than any timestamp for that location that the thread has observed so far; this is 
expressed by relating it to a previous timestamp that the thread has to provide 
through an O(z, _,¢) assertion in the precondition. 


4 This V modality is similar in spirit, but weaker than that of FSL [8]. 
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o(v) is pure 


{iaraa nf las = ef 


(W-RLX) 


at! >t. \ 
W" (a, {(v,t’/)} UX) 


The Rel(x, 4) assertion is duplicable, so there is no need for the rule to keep it. 

In practice, O(a, _,t) is taken to be that of the last read from «v if it was the 
last operation on x, and O(z, fst(max(X)), snd(max(X))) if the last operation 
on x was a write, including the initial write. The latter can be obtained by 


W(x, X) * (v,t) € X > W"(a, X) * O(a, v, t) (Write-Observed) 


Relared Read Rule. To read from location x (to which the location expression 
e evaluates), the thread needs to own an Acq(x, ¢) assertion, which gives it the 
right to (almost) obtain assertion (v) upon reading value v from location z. 
The thread then keeps its Acq(z, ¢), and obtains an assertion O(x, r,t’) stating 
that it has read the value now in register r from location z, timestamped with t. 
This timestamp is no older than any timestamp for that location that the thread 
has observed so far, expressed again by relating it to an O(z,_,¢) assertion in 
the precondition. Moreover, it obtains the pure portion V¢(r) of the assertion 
¢(r) corresponding to the value read in register r 


+ {e= xx Acq(z, p) * O(z, -,t)} 
r= [eles (R-RLX) 
{It >t. Acq(x, p) * O(a, r,t’) * Ve(r)} 


Again, we can obtain O(x,vj,0), where vg is the initial value of x, from the 
initial write permission for x, and distribute it to all the threads that will read 
from x, expressing the fact that the initial value is available to all threads, and 
use it as the required O(z, _,t) in the precondition of the read rule. 

Moreover, if a thread owns the exclusive write permission for a location z, 
then it can take advantage of the fact that it is the only writer at that location 
to obtain more precise information about its reads from that location: they will 
read the last value it has written to that location. 


h {e= xx Acq(z, b) x W'(a,X)} 
Pa lela (R-RLX*) 
{3t. (r,t) = max(X) * Acq(a, o) * W1(x, X) x O(a, r,t) * Ve(r)} 


Separation. With these assertions, we can straightforwardly specify and verify 
the Disjoint-Lists example. Ownership of an element of a list is simply expressed 
using a full write permission, W! (x, X). This allows including the Disjoint-Lists 
as a snippet in a larger program where the lists can be shared before or after, and 
still enforce the separation property we want to establish. While this reasoning 
sounds underwhelming (and we elide the details), we remark that it is unsound 
in models that allow OOTA behaviours. 
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2.3 Reasoning About Coherence 


An important feature of many memory models is coherence, that is, the existence 
of a per-location total order on writes that reads respect. Coherence becomes 
interesting where there are multiple simultaneous writers to the same location 
(write/write races). In our logic, write assertions can be split and combined as 
follows: if mı + 72 <1, 0 < 7, and 0 < T2 then 


W™* (x, X U Xo) & W™ (x, X1) * W(x, Xo) (Combine-Writes) 


To reason about coherence, the following rules capture the fact that the 
timestamps of the writes at a given location are all distinct, and totally ordered: 


W7(x,X)*(v,t)e X*(v' t)eXxvuAv SW (a, X)*«t At’ 
(Different- Writes) 

W'(a,X)*(_,f)€ X*(_,¢7) eX SW (a, X)*«(t<UVt=t vt <t) 
(Writes-Ordered) 


CoRR2. One of the basic tests of coherence is the CoRR2 litmus test, which 
tests whether two threads can disagree on the order of two writes to the same 
location. The following program, starting with location x holding 0, should not 
be allowed to finish with rı = 1 * rg = 2 * r3 = 2 x r4 = 1, as that would mean 
that the third thread sees the write of 1 to x before the write of 2 to x, but that 
the fourth thread sees the write of 2 before the write of 1: 


ri i= [thas 
T2 := |T 


ra = [less (CoRR2) 


T4 := |T] 


[z] :=1 || [z] = 2 


rlx x 


Coherence enforces a total order on the writes to x that is respected by the reads, 
so if the third thread reads 1 then 2, then the fourth cannot read 2 then 1. 

We use the timestamps in the O(x,a,t) assertions to record the order in 
which reads read values, and then link the timestamps of the reads with those 
of the writes. Because we do not transfer anything, the predicate for x is Av. T 
again, and we elide the associated clutter below. 

The proof outline for the writers just records what values have been written: 


{wire {0 O)}) *...} 7 {(0,0)}) *...} 


{3#1.W1/? (a, {(1, t1), (0, 0)}) *...} || {3é2.W1/2(2, {(2, t2), (0,0)}) »...} 


The proof outline for the readers just records what values have been read, 
and—crucially—in which order. 


{Acq(a, Av. T) * O(a, 0,0) } 


rı := [a] 


{Ita Acala reg 
Jta Acg(a, Av. T) Ola, tita] #0 < fatast om 

4 = 
ro := [z] az rlx 
{ Sty; ty. Oln, Piste) * Ole, 19, fp) #0 < tatta S to} 


A Separation Logic for a Promising Semantics 365 


At the end of the program, by combining the two write permissions using rule 
Combine-Writes, we obtain W!(z, {(1, t1), (2, t2), (0,0)}). From this, we have 
ty < tg or tg < tı by rules Different-Writes and Writes-Ordered. Now, assuming 
ry = 1 and r2 = 2, we have ta < ty, and so ty < tg by rule Reads-from-Write. 
Similarly, assuming r3 = 2 and r4 = 1, we have tə < tı. Therefore, we cannot 
have rı = 1 * r2 = 2 x r3 = 2 x r4 = 1, so coherence is respected, as desired. 


2.4 Handling Release and Acquire Accesses 


Next, consider release and acquire accesses, which, in addition to coherence, 
provide synchronisation and enable the message passing idiom. 


[z]rix = 1; 


ri := ly]ac ; 
[y]rer := 1 i Me) 


ifr; = 1 then r2 := [z]; 


The first thread writes data (here, 1) to a location x, and signals that the data is 
ready by writing 1 to a “flag” location y with a release write. The second thread 
reads the flag location y with an acquire read, and, if it sees that the first thread 
has signalled that the data has been written, reads the data. The release /acquire 
pair is sufficient to ensure that the data is then visible to the second thread. 

Release/acquire can be understood abstractly in terms of views [15]: a release 
write contains the view of the writing thread at the time of the writing, and an 
acquire read updates the view of the reading thread with that of the release 
write it is reading from. This allows one-way synchronisation of views between 
threads. 

To handle release/acquire accesses in SLR, we can adapt the rules for relaxed 
accesses by enabling ownership transfer according to predicate associated with 
the Rel and Acq permissions. The resulting rules are strictly more powerful than 
the corresponding RSL [27] rules, as they also allow us to reason about coherence. 


Release Write Rule. The release write rule is the same as for relaxed writes, but 
does not require the predicate to be a pure predicate, thereby allowing sending 
of actual resources, rather than just information: 


E {e1 = x * eg = v x W7 (x, X) x Rel(zx, p) * (v) x O(a, _,t)} 
eiļre1 := €2 (W-REL) 
{It >t. W(x, {(v,t)}U X)} 


Acquire Read Rule. Symmetrically, the acquire read rule is the same as for 
relaxed reads, but allows the actual resource to be obtained, not just its pure 
portion: 
F {e= zx Acq(x, p) * O(x,-,t)} 
r= [elacq (R-ACQ) 
E4 > t. Acq(x, o|r T]) * O(a, r,t’) * o(r) } 
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We have to update @ to record the fact that we have obtained the resource 
associated with reading that value, so that we do not erroneously obtain that 
resource twice; ¢[v’ + P] stands for Av. if v = v’ then P else $(v). 

As for relaxed accesses, we can strengthen the read rule when the reader is 
also the exclusive writer to that location: 


H {Acq(z, p) * WE (g, X)} 
r= [Blac 


At. (r,t) = max(X) * Acq(x, o|r => a 
x W1 (a, X) * O(x, r,t) * G(r) 


(R-ACQ*) 


Additionally, we allow duplicating of release assertions and splitting of 
acquire assertions, as expressed by the following two rules. 

Rel(x, b) <> Rel(a, d) x Rel(x, o) (Release-Duplicate) 

Acq(x, Av. d1(v) * b2(v)) S Acq(a, 61) * Acq(a, p2) (Acquire-Split) 

Message Passing. With these rules, we can easify verify the message passing 

example. Here, we want to transfer a resource from the writer to the reader, 

namely the state of the data, x. By transferring the write permission for the 


data to the reader over the “flag” location, y, we allow the reader to use it to 
read the data precisely. We do that by picking the predicate 


dy = Av. v =1 AW" (a, [1;0]) Vu 41 


for y. Since we do not transfer any resource using x, the predicate for x is Av. T. 
The writer transfers the write permissions for x away on y using @y: 


{W1 (x, [0]) * Rel(x, Av. T) * W! (y, [0]) * Rel(y, dy) } 


[2] r1x = 1; 

{W1 (x, [1; 0]) * W1 (y, [0]) * Rel(y, dy) } 

| ia {(0,0)}) + AEE ) * O(x,0,0)} 
y re 


{3tı. W (y, {(1,t1)} U {(0,0)}) *0 < ty} 
{W} (y, [1; 0]) * Rel(y, dy) } 


The proof outline for the reader uses the acquire permission ģy for y to obtain 
W?(a,[1;0]), which it then uses to know that it reads 1 from zx. 


{ Acq( a O(y, 0,0) * Acq(a, Av. T)} 

rı = |y aca? 
-~ = 0, Astle Qylrı — T]) * Oy, r1, t4) * by(71) * Acq(x, Av. T)} 
dy(r1) * Acq(a, dv. T)} 


if rı = 1 then 
{Wt (x, [1;0]) * Acq(x, Av. T)} 
r2 := (cl... 


{Acq(x, ie T) * W1(a, [1;0]) * (r2 = 1)} 
= =) => T2 = 1} 
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2.5 Plain Accesses 


Our formal development (in the technical appendix) also features the usual “par- 
tial ownership” x + v assertion for “plain” (non-atomic) locations, and the usual 
corresponding rules. 


3 The Promising Semantics 


In this section, we provide an overview of the promising semantics [13], the model 
for which we prove SLR sound. Formal details can be found in [1,13]. 

The promising semantics is an operational semantics that interleaves execu- 
tion of the threads of a program. Relaxed behaviour is introduced in two ways: 


— As in the “strong release/acquire” model [15], the memory is a pool of times- 
tamped messages, and each thread maintains a “view” thereof. A thread may 
read any value that is not older than the latest value observed by the thread 
for the given location; in particular, this may well not be the latest value 
written to that particular location. Timestamps and views model non-multi- 
copy-atomicity: writes performed by one thread do not become simultaneously 
visible by all other threads. 

— The operational semantics contains a non-standard step: at any point a thread 
can nondeterministically promise a write, provided that, at every point before 
the write is actually performed, the thread can certify the promise, that is, 
execute the write by running on its own from the current state. Promises are 
used to enable load-store reordering. 


The behaviour of promising steps can be illustrated on the LB+data+fakedep 
litmus test from the Introduction. The second thread can, at the very start of 
the execution, promise a write of 1 to x, because it can, by running on its own 
from the current state, read from y (it will read 0), then write 1 to x (because 
0+1-—0 = 1), thereby fulfilling its promise. On the other hand, the first thread 
cannot promise a write of 1 to y at the beginning of the execution, because, by 
running on its own, it can only read 0 from x, and therefore only write 0 to y. 


3.1 Storage Subsystem 


Formally, the semantics keeps track of writes and promises in a global configura- 
tion, gconf = (M, P), where M is a memory and P C M is the promise memory. 
We denote by gconf.M and gconf.P the components of gconf. Both memories are 
finite sets of messages, where a message is a tuple (x :? v, R@t]), where x € Loc 
is the location of the message, v € Val its value, 7 € Tid its originating thread, 
t € Time its timestamp, R its message view, and o € {rlx,re1} its message 
mode, where Time is an infinite set of timestamps, densely totally ordered by 
<, with a minimum element, 0. (We return to views later.) We denote m.loc, 
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m.val, m.time, m.view and m.mod the components of a message m. We use the 
following notation to restrict memories: 


M(i)2{méeM|mtid=i}  M(rel) Ë {m € M | mmod = rel} 
M(x) ={méeM|mioc=a}  M(rix) {= {m € M | m.mod = r1x} 
M(i, £) = M(i) 0 M(z) 


A global configuration gconf evolves in two ways. First, a message can be 
“promised” and be added both to gconf.M and gconf.P. Second, a message can 
be written, in which case it is either added to gconf.M, or removed from gconf.P 
(if it was promised before). 


3.2 Thread Subsystem 


A thread state is a pair TS = (o, V}, where ø is the internal state of the thread 
and V is a view. We denote by TS.c and TS.V the components of TS. 


Thread Internal State. The internal state o consists of a thread store (denoted 
o.4) that assigns values to local registers and a statement to execute (denoted 
c.s). The transitions of the thread internal state are labeled with memory actions 
and are given by an ordinary sequential semantics. As these are routine, we leave 
their description to the technical appendix. 


Views. Thread views are used to enforce coherence, that is, the existence of 
a per-location total order on writes that reads respect. A view is a function 
V : Loc > Time, which records how far the thread has seen in the history of each 
location. To ensure that a thread does not read stale messages, its view restricts 
the messages the thread may read, and is increased whenever a thread observes 
a new message. Messages themselves also carry a view (the thread’s view when 
the message comes from a release write, and the bottom view otherwise) which 
is incorporated in the thread view when the message is read by an acquire read. 


Additional Notations. The order on timestamps, <, is extended pointwise to 
views. | and U denote the natural bottom elements and join operations for 
views. {x@t} denotes the view assigning t to x and 0 to other locations. 


3.3 Interaction Between a Thread and the Storage Subsystem 


The interaction between a thread and the storage subsystem is given in 
terms of transitions of thread configurations. Thread configurations are tuples 
(TS,(M,P)), where TS is a thread state, and (M,P) is a global configura- 
tion. These transitions are labelled with 8 € {NP, prom} in order to distinguish 
whether they involve promises or not. A thread can: 
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— Make an internal transition with no effect on the storage subsystem. 

— Read the value v from location x, when there is a matching message in mem- 
ory that is not outdated according to the thread’s view. It then updates its 
view accordingly: it updates the timestamp for location x and, in addition, 
incorporates the message view if the read is an acquire read. 

— Write the value v to location x. Here, the thread picks a timestamp greater 
than the one of its current view for the message it adds to memory (or removes 
from the promise set). If the write is a release write, the message carries 
the view of the writing thread. Moreover, a release write to x can only be 
performed when the thread has already fulfilled all its promises to x. 

— Non-deterministically promise a relaxed write by adding a message to both 
M and P. 


3.4 Constraining Promises 


Now that we have described how threads and promises interact with mem- 
ory, we can present the certification condition for promises, which is essen- 
tial to avoid out-of-thin-air behaviours. Accordingly, we define another tran- 
sition system, = >, on top of the previous one, which enforces that the memory 
remains “consistent”, that is, all the promises that have been made can be cer- 
tified. A thread configuration (TS, (M,P)) is called consistent w.r.t. i € Tid 


if thread 7 can fulfil its promises by executing on its own, or more formally if 
(TS,(M,P)) ©, (TS',(M’, P’)) for some TS’, M', P’ such that P'(i) = 0. 
Certification is local, that is, only thread 7 is executing during its certification; 
this is crucial to avoid out-of-thin-air. Further, the certification itself cannot 
make additional promises, as it is restricted to NP-steps. Here is a visual repre- 


sentation of a promise machine run, together with certifications. 


SCARY 
AO « 
oe AÑ 


The thread configuration =>-transitions allow a thread to (1) take any num- 
ber of non-promising steps, provided its thread configuration at the end of the 
sequence of step (intuitively speaking, when it gives control back to the sched- 
uler) is consistent, or (2) take a promising step, again provided that its thread 
configuration after the step is consistent. 


3.5 Full Machine 


Finally, the full machine transitions simply lift the thread configuration =>- 
transitions to the machine level. A machine state is a tuple MS = (TS, (M, P)), 
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where TS is a function assigning a thread state TS to every thread, and (M, P) 
is a global configuration. The initial state Ms? (for a given program) consists 
of the function TS° mapping each thread i to its initial state (o9, L), where o? 
is the thread’s initial local state and L is the zero view (all timestamps in views 
are 0); the initial memory M° consisting of one message (x :3/* 0, .@0]) for each 
location x; and the empty set of promises. 


4 Semantics and Soundness 


In this section, we present the semantics of SLR, and give a short overview of 
the soundness proof. Our focus is not on the technical details of the proof, but 
on the two main challenges in defining the semantics and proving soundness: 


1. Reasoning about promises. This difficulty arises because promise steps can be 
nondeterministically performed by the promise machine at any time. 

2. Reasoning about release-acquire ownership transfer in the presence of 
promises. The problem is that writes may be promised before the thread 
has acquired enough resources to allow it to actually perform the write. 


4.1 The Intuition 


SLR assertions are interpreted by (sets of) resources, which represent permis- 
sions to write to a certain location and/or to obtain further resources by reading 
a certain message from memory. As is common in semantics of separation log- 
ics, the resources form a partial commutative monoid, and SLR’s separating 
conjunction is interpreted as the composition operation of the monoid. 

When defining the meaning of a Hoare triple {P} s {Q}, we think of the 
promise machine as if it were manipulating resources: each thread owns some 
resources and operates using them. The intuitive description of the Hoare triple 
semantics is that every run of the program s starting from a state containing the 
resources described by the precondition, P, will be “correct” and, if it terminates, 
will finish in a state containing the resources described by the postcondition, Q. 
The notion of a program running correctly can be described in terms of threads 
“respecting” the resources they own; for example, if a thread is executing a write 
or fulfilling a promise, it should own a resource representing the write permission. 


4.2 A Closer Look at the Resources and the Assertion Semantics 


We now take a closer look at the structure of resources and the semantics of 
assertions, whose formal definitions can be found in Figs. 2 and 3. 

The idea is to interpret assertions as predicates over triples consisting of mem- 
ory, a view, and a resource. We use the resource component to model assertions 
involving ownership (i.e., write assertions and acquire assertions), and model 
other assertions using the memory and view components. Once a resource is 
no longer needed, SLR allows us to drop these from assertions: P x Q => P. 
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To model this we interpret assertions as upwards-closed predicates, that may 
own more than explicitly asserted. The ordering on memories and views is given 
by the promising semantics, and the ordering on resources is induced by the 
composition operation in the resource monoid. For now, we leave the resource 
composition unspecified, and return to it later. 


L € PredId={N (predicate identifiers) 
Perm = {r €Q|0<a<1} (fractional permissions) 
Write = P( Val x Time) 
WrPerm = Loc + {(x, X) € Perm x Write | r =0=> X = 0} 
AcqPerm “ Loc + P(PredId) 
r = (r.wr,r.acq) € Res “! WrPerm x AcqPerm (resources) 
W = (W.rel, W.acq) € World Ë (Loc + Pred) x (PredId fin Pred) (worlds) 


Prop = World —>mon P? (Mem x View x Res) 


Fig. 2. Semantic domains used in this section. 


In addition, however, we have to deal with assertions that are parametrised by 
predicates (in our case, Rel(a,¢) and Acq(a, @)). Doing so is not straightforward 
because naive attempts of giving semantics to such assertions result in circular 
definitions. A common technique for avoiding this circularity is to treat predi- 
cates stored in assertions syntactically, and to interpret assertions relative to a 
world, which is used to interpret those syntactic predicates. In our case, worlds 
consist of two components: the WrPerm component associates a syntactic SLR 
predicate with every location (this component is used to interpret release per- 
missions), while the AcgPerm component associates a syntactic predicate with a 
finite number of currently allocated predicate identifiers (this component is used 
to interpret acquire permissions). The reason for the more complex structure 
for acquire permissions is that they can be split (see (Acquire-Split)). There- 
fore, we allow multiple predicate identifiers associated with a single location. 
When acquire permissions are divided and split between threads, new predicate 
identifiers are allocated and associated with predicates in the world. The world 
ordering, Wı < Wb, expresses that world Wə is an extension of W , in which 
new predicate identifiers may have been allocated, but all existing predicate 
identifiers are associated with the same predicates. 

Let us now focus our attention on the assertion semantics. The semantics of 
assertions, [P]}, is relative to a thread store u that assigns values to registers, 
and an environment 7 that assigns values to logical variables. 

The standard logical connectives and quantifiers are interpreted following 
their usual intuitionistic semantics. The semantics of our novel assertions is given 
in Fig.3 and can be explained as follows: 


— The observed assertion O(2,v,t) says that the memory contains a message 
at location x with value v and timestamp t, and the current thread knows 
about it (i.e., the thread view contains it). 
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— The write assertion W” (x, X) asserts ownership of a (partial, with fraction 7) 
write resource at location x, and requires that the largest timestamp recorded 
in X does not exceed the view of the current thread. 

— The acquire assertion, Acq(2,@), asserts that location z has some predicate 
identifier 4 associated with the ¢ predicate in the current world W. 

— The release assertion, Rel(z, ¢), asserts that location x is associated with some 
predicate ¢’ in the current world such that there exists a syntactic proof of 
the entailment, F Vv. d(v) = ¢’(v). The implication allows us to strengthen 
the predicate in release assertions. 

— Finally, VP states that P is satisfiable in the current world. 


Note that W7(a,X), Acq(z,@), and Rel(x,@) only talk about owning certain 
resources, and do not constrain the memory itself at all. In the next subsection, 
we explain how we relate the abstract resources with the concrete machine state. 


[O(e, v, ROW) © {(M, V, r) 
3j, R, o. (Le? 3 [v17 ROL) € M A EI} < V(e)} 

[W" (e, XON) 2 {(M, V, r) | 3r > fal}. ree (E22) = (r, XT) 

A snd(max([X1]})) < V (lel 2} 


A 
& 


[Acq(x, ¢)] OV) = {(M, V, r) | 3 € r.acq(]x];}). W.acq(s) = o} 
[Rel(x, H7O) = {(M, V, r) | E Wu. (v) = W.rel ([a]?)(v)} 
[VPIZ(W) = {(M, V, r) | [PILOW) # 0} 


Fig. 3. Interpretation of SLR assertions, I-I}: Assn — Prop 


4.3 Relating Concrete State and Resources 


Before giving a formal description of the relationship between abstract resources 
and concrete machine states, we return to the intuition of threads manipulating 
resources presented in Sect. 4.1. 

Consider what happens when a thread executes a release write to a loca- 
tion x. At that point, the thread has to own a release resource represented by 
Rel(a,), and to store the value v, it has to own the resources represented by 
¢(v). As the write is executed, the thread gives up the ownership of the resources 
corresponding to ¢(v). Conversely, when a thread that owns the resource rep- 
resented by Acq(z,@) performs an acquire read of a value v from location zx, it 
will gain ownership of resources satisfying ¢(v). However, this picture does not 
account for what happens to the resources that are “in flight”, i.e., the resources 
that have been released, but not yet acquired. 

Our approach is to associate in-flight resources to messages in the memory. 
When a thread does a release write, it attaches the resources it released to 
the message it just added to the memory. That way, a thread performing an 
acquire read from that message can easily take ownership of the resources that 
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are associated to the message. Formally, as the execution progresses, we update 
the assignment of resources to messages, 


u: M(rel) > (PredId — Res). 


For every release message in memory M, the message resource assignment u 
gives us a mapping from predicate identifiers to resources. Here, we again use 
predicate identifiers to be able to track which acquire predicate is being satisfied 
by which resource. The intended reading of u(m)(t) = r is that the resource r 
attached to the message m satisfies the predicate with the identifier v. 

We also require that the resources attached to a message (i.e., the resources 
released by the thread that wrote the message) suffice to satisfy all the acquire 
predicates associated with that particular location. Together, these two prop- 
erties of our message resource assignment, as formalised in Fig. 4, allow us to 
describe the release/acquire ownership transfer. 


def 


MEr,uw= 
Ym E€ M(rel). r.acq(m.loc) = dom(u(m)) E TEATE 
AWE dom(u(m)). satisfy predicates 


(M, m.view, u(m)(e)) € [W.acq(z) (m.va1)]| (w) ee 
A Va,u. W.rel(x) (v = ®.e€r.acq(x) W-acq(z) (v) released resources are 
A Ym E€ dom(u). dom(u(m)) Cc dom(W.acq) enough to satisfy acquires 


A Ym € M(r1x). no ownership transfer 
((0, 0), Ax.0,€) € [W.re1(m.10c)(m.va1)]} (W.rel, []) f via relaxed accesses 


Fig. 4. Message resource satisfaction. 


The last condition in the message resource satisfaction relation has to do 
with relaxed accesses. Since relaxed accesses do not provide synchronisation, 
we disallow ownership transfer through them. Therefore, we require that the 
release predicates connected with the relaxed messages are satisfiable with 
the empty resource. This condition, together with the requirement that the 
released resources satisfy acquire predicates, forbids ownership transfer via 
relaxed accesses. 

The resource missing from the discussion so far is the write resource (mod- 
elling the W7 (x, X) assertion). Intuitively, we would like to have the following 
property: whenever a thread adds a message to the memory, it has to own the 
corresponding write resource. Recall there are two ways a thread can produce a 
new message: 


1. A thread performs a write. This is the straightforward case: we simply require 
the thread to own the write resource and to update the set of value-timestamp 
pairs recorded in the resource accordingly. 

2. A thread promises a write. Here the situation is more subtle, because the 
thread might not own the write resource at the time it is issuing the promise, 
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but will acquire the appropriate resource by the time it fulfils the promise. So, 
in order to assert that the promise step respects the resources owned by the 
thread, we also need to be able to talk about the resources that the thread 
can acquire in the future. 


When dealing with the promises, the saving grace comes from the fact that 
all promises have to be certifiable, i.e., when issuing a promise a thread has to 
be able to fulfil it without help from other threads. 

Intuitively, the existence of a certification run tells us that even though at 
the moment a thread issues a promise, it might not have the resources neces- 
sary to actually perform the corresponding write, the thread should, by running 
uninterrupted, still be able to obtain the needed resources before it fulfils the 
promise. This, in turn, tells us that the needed resources have to be already 
released by the other threads by the time the promise is made: only resources 
attached to messages in the memory are available to be acquired, and only the 
thread that made the promise is allowed to run during the certification; therefore 
all the available resources have already been released. 

The above reasoning shows what it means for the promise steps to “respect 
resources”: when promises are issued, the resources currently owned by a thread, 
together with all the resources it is able to acquire according to the resources it 
owns and the current assignment of resources to messages, have to contain the 
appropriate write resource for the write being promised. The notion of “resources 
a thread is able to acquire” is expressed through the canAcq(r,u) predicate. 
canAcq(r, u) performs a fixpoint calculation: the resources we have (r) allow us 
to acquire some more resources from the messages in memory (assignment of 
resources to messages is given by u), which allows us to acquire some more, and 
so on. Its formal definition can be found in the technical appendix, and hinges 
on the fact that u precisely tracks which resources satisfy which predicates. 


‘ 


rier, = (r1.WY @ur T2.WT, 11.ACQ @acq T2-aCq) e = ({], A-. 0) 
Ax. (fi(x).perm + fo(x).perm, f1(x).msgs U fo(x).msgs) 
fin fo = if fı(x).perm + f2(x).perm < 1 for all locations x 


undefined otherwise 


g1 ®acq 92 “ if Yx. g(x) N go(x) = 0 then Az. gi(x) U go(x) else undefined 


Fig. 5. Resource composition. 


An important element that was omitted from the discussion so far is the defi- 
nition of the composition in the resource monoid Res. The resource composition, 
defined in Fig. 5, follows the expected notion of per-component composition. The 
most important feature is in the composition of write resources: a full permission 
write resource is only composable with the empty write resource. 

At this point, we are equipped with all the necessary ingredients to relate 
abstract states represented by resources to concrete states (M, P} (where M is 
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[rr,u,Wr = {(M, P) | let r = Tierra rr Ò ° Tmem Ueedom(u(my) (m)l) in 
(1) MEr,u,wa 
(2) Va.{(m.val,m.time) | m € M(x) \ P} =r.wr(x).msgs ^ 
(3) Ym EP. m.tid g T > 
(rr(m.tid) e canAcq(rr(m.tid), u)).wr(m.loc).perm > 0} 
rr: ThreadId + Res maps threads to the resources they own. 
r is the sum of all the resources distributed among the threads and messages. 


Fig. 6. Erasure. 


memory, and P is the set of promised messages). We define a function, called 
erasure, that given an assignment of resources to threads, rp: ThreadId — Res, 
an assignment of resources to messages, u, and a world, W, gives us a set of 
concrete states satisfying the following conditions: 


1. Memory M is consistent with respect to the total resource r and the message 
resource assignment u at world W. 

2. The set of fulfilled writes to each location x in (M, P} must match the set of 
writes of all write permissions owned by any thread or associated with any 
messages, when combined. 

3. For all unfulfilled promises to a location x by thread i, thread i must currently 
own or be able to acquire from u at least a shared write permission for x. 


Our formal notion of erasure, defined in Fig. 6, has an additional parameter, 
a set of thread identifiers T. This set allows us to exclude promises of threads T 
from the requirement of respecting the resources. As we will see in the following 
subsection, this additional parameter plays a subtle, but key, role in the sound- 
ness proof. (The notion of erasure described above corresponds to the case when 
T = Í.) 

Note also that the arguments of erasure very precisely account for who owns 
which part of the total resource. This diverges from the usual approach in sepa- 
ration logic, where we just give the total resource as the argument to the erasure. 
Our approach is motivated by Lemma 1, which states that a reader that owns the 
full write resource for location x knows which value it is going to read from z. 
This is the key lemma in the soundness proof of the (R-RLX*) and (R-ACQ*) 
rules. 

Lemma 1. If (M,V,rr(i)) € [Wt (x, X)] 0), and (M, P) € [rr,u,W] sa 
then for all messages m € M(x) \ P(i) such that V(x) < m.time, we have 
m.val = fst(max(X)). 


Lemma 1 is looking from the perspective of thread i that owns the full write 
resource for the location x. This is expressed by (M, V, rr (i)) € [W (z, X)] OW) 
(recall that rp(i) are the resources owned by the thread i). Furthermore, the 
lemma assumes that the concrete state respects the abstract resources, expressed 


by (M, P) € [rr,u, W] {a}. Under these assumptions, the lemma intuitively tells 
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us that the current thread knows which value it will read from x. Formally, the 
lemma says that all the messages thread 7 is allowed to read (i.e., messages in 
the memory that are not outstanding promises of thread 7 and whose timestamp 
is greater or equal to the view of thread 7) have the value that appears as the 
maximal element in the set X. 

To see why this lemma holds, consider a message m € M(x) \ P(i). If m 
is an unfulfilled promise by a different thread j, then, by erasure, it follows 
that j currently owns or can acquire at least a shared write permission for x. 
However, this is a contradiction, since thread i currently owns the exclusive 
write permission, and, by erasure, rp(i) is disjoint from the resources of all 
other threads and all resources currently associated with messages by u. Hence, 
m must be a fulfilled write. By erasure, it follows that the set of fulfilled writes 
to x is given by the combination of all write permissions. Since rp(i) owns the 
exclusive write permission, this is just rp(i).wr. Hence, the set of fulfilled writes 
is X, and the value of the last fulfilled write is fst(max(X)). 

Note that in the reasoning above, it is crucial to know which thread and which 
message owns which resource. Without precisely tracking this information, we 
would be unable to prove Lemma 1. 


4.4 Soundness 


Now that we have our notion of erasure, we can proceed to formalise the meaning 
of triples, and present the key points of the soundness proof. 

Recall our intuitive view of Hoare triples saying that the program only makes 
steps which respect the resources it owns. This notion is formalised using the 
safety predicate: safety (somewhat simplified; we give its formal definition in 
Fig. 7) states that it is always safe to perform zero steps, and performing n + 1 
steps is safe if the following two conditions hold: 


1. If no more steps can be taken, the current state and resources have to satisfy 
the postcondition B. 

2. If we can take a step which takes us from the state (M, P) (which respects our 
current resources r, the assignment of resources to messages u, and world W) 
to the state (M’, P’), then 


def 


safeo(o, B)(W1) = Mem x View x Res 
safen+1 (0, B)(W1) = {(M1, Va, r1) | (M, V, r) > (Mi, Vi, r1). VW > W. 
(o.s = skip > (M, V, r) € vs(B(o.u))(W)) 
A (YP,rr,o', M', P',V',u,i. (M, P) € |rrli m r], u, W]g A 
((o, V), (M, P)) =; (0, vV’), (M', P')) 
> 3r’ ,u, W > W.(M', P’) € [relive r], u’, W Jor 
((M', P’),V’,r’) € safen (o, B)(W’))} 


Fig. 7. Safety. 
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(a) there exist resources r’, an assignment of resources to messages u’, and a 
future world W’, such that (M’, P’) respects r’, u’, and W’, and 

(b) we are safe for n more steps starting in the state ( M’, P’) with resources 
given by r’, u’ and W’. 


Note the following: 


— Upon termination, we are not required to satisfy exactly the postcondition B, 
but its view shift. A view shift is a standard notion in concurrent separation 
logics, which allows updates of the abstract resources which do not affect the 
concrete state. In our case, this means that resource r can be view-shifted 
into r’ satisfying B as long as the erasure is unchanged. The formal definition 
of view shifts is given in the appendix. 

— Again as is standard in separation logics, safety requires framed resources to 
be preserved. This is the role of rp in the safety definition. Frame preserva- 
tion allows us to compose safety of threads that own compatible resources. 
However, departing from the standard notion of frame preservation, we pre- 
cisely track who owns which resource in the frame, because this is important 
for erasure. 


The semantics of Hoare triples is simply defined in terms of the safety predi- 
cate. The triple {P} s {Q} holds if every logical state satisfying the precondition 
is safe for any number of steps: 


l- {P} s {Q}] = Yn, m,n, W. [PI(W) © safen ((m, 8), Au’. [Q]7),)(W) 


To establish soundness of the SLR, proof rules, we have to prove that the 
safety predicate holds for arbitrary number of steps, including promise steps. The 
trouble with reasoning about promise steps is that they can nondeterministically 
appear at any point of the execution. Therefore, we have to account for them in 
the soundness proof of every rule of our logic. To make this task manageable, 
we encapsulate reasoning about the promise steps in a theorem, thus enabling 
the proofs of soundness for proof rules to consider only the non-promise steps. 

To do so, once again certification runs for promises play a pivotal role. Recall 
that whenever a thread makes a step, it has to be able to fulfil its promises 
without help from other threads (Sect. 3.4). Since there will be no interference by 
other threads, performing promise steps during certification is of no use (because 
promises can only be used by other threads). Therefore, we can assume that the 
certification runs are always promise-free. 

Now that we have noted that certifications are promise-free, the key idea 
behind encapsulating the reasoning about promises is as follows. If we know 
that all executions of our program are safe for arbitrarily many non-promising 
steps, we can use this to conclude that they are safe for promising steps too. 
Here, we use the fact that certification runs are possible runs of the program, 
and the fact that certifications are promise-free. 

Let us now formalise our key idea. First, we need a way to state that execu- 
tions are safe for non-promising steps. This is expressed by the non-promising 
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safety predicate defined in Fig.8. What we want to conclude is that non- 
promising safety is enough to establish safety, as expressed by Theorem 1: 


Theorem 1 (Non-promising safety implies safety) 
Vn, o, B,W. npsafe(,41,9)(7, B)(W) C safen (0, B)(W) 


We now discuss several important points in the definition of non-promising safety 
which enable us to prove this theorem. 


Non-promising Safety is Indexed by Pairs of Natural Numbers. When proving 
Theorem 1, we use promise-free certification runs to establish the safety of the 
promise steps. A problem we face here is that the length of certification runs is 
unbounded. Somehow, we have to know that whenever the thread makes a step, 
it is npsafe for arbitrarily many steps. Our solution is to have npsafe transfinitely 
indexed over pairs of natural numbers ordered lexicographically. That way, if we 
are npsafe at index (n+ 1,0) and we take a step, we know that we are npsafe 
at index (n,m) for every m. We are then free to choose a sufficiently large m 
depending on the length of the certification run we are considering. 


Non-promising Safety Considers Configurations that May Contain Promises. It 
is important to note that the definition of non-promising safety does not require 
that there are no promises in the starting configuration. The only thing that is 
required is that no more promises are going to be issued. This is very impor- 
tant for Theorem 1, since safety considers all possible starting configurations 
(including the ones with existing promises), and if we want the lemma to hold, 
non-promising safety has to consider all possible starting configurations too. 


Erasure Used in the Non-promising Safety does not Constrain Promises of the 
Current Thread. Non-promising safety does not require promises by the thread 
being reduced (i.e., thread 7) to respect resources. Thus, when reasoning about 
non-promising safety of thread i, we cannot assume that existing promises by 
thread 7 respect resources, but crucially we also do not have to worry about 
recertifying thread 7’s promises. However, since the 2P, reduction does not recer- 
tify promises, we explicitly require that the promises are well formed (via wfprom 
predicate) in order to ensure that we still only consider executions where threads 
do not read from their own promises. 


Additional Constraints by the Non-promising Safety. Non-promising safety also 
imposes additional constraints on the reducing thread 7. In particular, any write 
permissions owned or acquirable by į after the reduction were already owned or 
acquirable by i before the reduction step. Intuitively, this holds because thread i 
can only transfer away resources and take ownership of resources it was already 
allowed to acquire before reducing. Lastly, non-promising safety requires that if 
the reduction of i performs any new writes or fulfils any old promises, it must own 
the write permission for the location of the given message. Together, these two 
conditions ensure that if a promise is fulfilled during a thread-local certification 
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npsafe(y m) (0, B)(W) “ Mem x View x Res 
npsafe,, +1,0) (0, B)(W) 2 Omen BPSafe(, m) (0, B)OW) 
npsafe(n+1m+1)(% B)(Wi) “ {(M,Vi,11) | VM, V, r) > (Mı, Va, r1). VW > Wi. 
(o.s = skip > (M,V,r) € vs(B(o.4))(W)) 
A(VP, rr, fo’, M',P',V', ui. 
(M, P) € [relive re fl,u,W] i; (weak erasure) 


A (e, V), (M, PY) “5; Uo, V’), (M', P’)) (only non-promising steps allowed) 
A wf prom(P(i), V) A wfprom(P’(4),V") (promises well formed) 
=> Ir',u, W > W.M' € |rrli = r o f],w,wW’| {i} (weak erasure) 
^ (M', V’, r") € npsafe(n 41, m) (0, B)(W’) 
Ar’ ecanAcq(r’,u’) <o r e canAcq(r, u) (no new res. acquirable after taking a step) 
A Ym € (M' \ P’) \ (M \ P). r.wr(m.loc).perm > 0} (when performing a write 
or fulfiling a promise 


the thread has to own 


the appropriate write res.) 


rı Lo T2 H Yy. rı.wr(x).perm < r2.wr(x).perm 


def 


wiprom(P, V) = Ym € P. V(m.loc) < m.time 


Fig. 8. Non-promising safety. 


and the thread satisfies non-promising safety, then the thread already owned 
or could acquire the write permission for the location of the promise. This is 
expressed formally in Lemma 2. 


Lemma 2. Assuming that ((M,P),V,r) € mpsafe;,,1,)(¢,B)(W) and 
k 
(M, P) € |rrli ro f] u, W Jg and ((o, V), (M, P)) ©; ((0’,V’),(M’, P')) 


a 


and m E (M’ \ P’) \ (M \ P), we have (re canAcq(r, u)).wr(m.loc).perm > 0. 


The intuition for why Lemma 2 holds is that since only thread i executes, 
we know by the definition of non-promising safety that any write permission 
owned or acquirable by 7 when the promise is fulfilled, it already owns or can 
acquire in the initial state. Furthermore, whenever a promise is fulfilled, the non- 
promising safety definition explicitly requires ownership of the corresponding 
write permission. It follows that the thread already owns or can acquire the 
write permission for the location of the given promise in the initial state. 

Lemma 2 gives us exactly the property that we need to reestablish erasure 
after the operational semantics introduces a new promise. This makes Lemma 2 
the key step in the proof of Theorem 1, which allows us to disentangle reasoning 
about promising steps and normal reduction steps. Theorem 1 tells us that, in 
order to prove a proof rule sound, it is enough to prove that the non-promising 
safety holds for arbitrary indices. This liberates us of the cumbersome reasoning 
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about promise steps and allows us to focus on non-promising reduction steps 
when proving the proof rules sound. 

We can now state our top-level correctness theorem, Theorem 2. Since our 
language only has top-level parallel composition, we need a way to distribute 
initial resources to the various threads, and to collect all the resources once all 
the threads have finished. The correctness theorem gives us precisely that: 


Theorem 2 (Correctness). If A is a finite set of locations and 


1. F Yx € A. ¢,(0) 

2. F reaRellz, bx) * Acq(x, dr) x WH (x, {(0,0)}) > @ic Tia Pi 

3. H {Pi} si {Qi} for alli 

4. (Ai. (ui, si), L), (M2, 0)) =* (TS, gconf) and TS(i).o = skip for all i 
5 

6. 


- F @ieTia Qi > Q 
. FRV (Q:) O FRV (Q;) =9 for all distinct i,j € Tid 


then there exist u,r, and W such that (gconf.M,U;TS(i).V,r) € QIE) and 
Vi € Tid.Va € FRV(Q;). ula) = TS(i).u(a), where FRV(P) denotes the set of 
free register variables in P. 


5 Related Work 


There are a number of techniques for reasoning under relaxed memory models, 
but besides the DRF theorems and some simple invariant logics [10,13], no other 
techniques have been proved sound for a model allowing the weak behaviour of 
LB-+data+fakedep from the introduction. The “invariant-based program logics” 
are by design unable to reason about programs like the random number gen- 
erator, where having a bound on the set of values written to a location is not 
enough, let alone reasoning about functional correctness of a program. 


Relaxed Separation Logic (RSL). Among program logics for relaxed memory, 
the most closely related is RSL [27]. There are two versions of RSL: a weak 
one that is sound with respect to the C/C++11 memory model, which features 
out-of-thin-air reads, and a stronger one that is sound with respect to a variant 
of the C/C++11 memory that forbids load buffering. 

The weak version of RSL forbids relaxed writes completely, and does not con- 
strain the value returned by a relaxed read. The stronger version provides single- 
location invariants for relaxed accesses, but its soundness proof relies strongly on 
a strengthened version of C/C++11 without poUrf cycles (where po is program 
order, and rf is the reads-from relation), which forbids load buffering. 

When it comes to reasoning about coherence properties, even the strong ver- 
sion of RSL is surprisingly weak: it cannot be used to verify any of the coherence 
examples in this paper. In fact, RSL can be shown sound with respect to much 
weaker coherence axioms than what C/C++11 relaxed accesses provide. 

One notable feature of RSL which we do not support is read-modify-write 
(RMW) instructions (such as compare-and-swap and fetch-and-add). However, 
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the soundness proof of SLR makes no simplifying assumptions about the promis- 
ing semantics which would affect the semantics of RMW instructions. Therefore, 
we are confident that enhancing SLR with rules for RMW instructions would not 
substantially affect the structure of the soundness proof, presented in Sect. 4. 


Other Program Logics. FSL [8] extends (the strong version of) RSL with stronger 
rules for relaxed accesses in the presence of release/acquire fences. In FSL, a 
release fence can be used to package an assertion with a modality, which a relaxed 
write can then transfer. Conversely, the ownership obtained by a relaxed read is 
guarded by a symmetric modality than needs an acquire fence to be unpacked. 
The soundness proof of FSL also relies on poUrf acyclicity. Moreover, it is known 
to be unsound in models where load buffering is allowed [9, Sect. 5.2]. 

A number of other logics—GPS [26], iGPS [12], OGRA [16], iCAP-TSO [24], 
the rely-guarantee proof system for TSO of Ridge [23], and the program logic 
for TSO of Wehrman and Berdine [28]—have been developed for even stronger 
memory models (release/acquire or TSO), and also rely quite strongly on—and 
try to expose—the stronger consistency guarantees provided by those models. 

The framework of Alglave and Cousot [2] for reasoning about relaxed con- 
current programs is parametric with respect to an axiomatic “per-execution” 
memory model. By construction, as argued by Batty et al. [3], such models 
cannot be used to define a language-level model allowing the weak behaviour 
of LB+data+fakedep and similar litmus tests while forbidding out-of-thin-air 
behaviours. Moreover, their framework does not provide the usual abstraction 
facilities of program logics. 

The lace logic of Bornat et al. [6] targets hardware memory models, in par- 
ticular Power. It relies on annotating the program with “per-execution” con- 
straints, and on syntactic features of the program. For example, it distinguishes 
LB+data+fakedep from LB+data+po, its variant where the write of second 
thread is [z],1, := 1, and is thus unsuitable to address out-of thin-air behaviours. 


Other Approaches. Besides program logics, another way to reason about pro- 
grams under weak memory models is to reduce the act of reasoning under a 
memory model M to reasoning under a stronger model M’—typically, but not 
necessarily, sequential consistency [7,18]. One can often establish DRF theo- 
rems stating that a program without any races when executed under M’ has 
the same behaviours when executed under M as when executed under M’. For 
the promising semantics, Kang et al. [13, Sect. 5.4] have established such the- 
orems for M’ being release-acquire consistency, sequential consistency, and the 
promise-free promising semantics, for suitable notions of races. The last one, 
the “Promise-Free DRF” theorem, is applicable to the Disjoint-Lists program 
from the introduction, but none of these theorems can be applied to any of the 
other examples of this paper, as they are racy. Moreover, these theorems are not 
compositional, as they do not state anything about the Disjoint-Lists program 
when put inside a larger, racy program—for example, just an extra read of a 
from another thread. 
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6 Conclusion 


In this paper, we have presented the first expressive logic that is sound under the 
promising semantics, and have demonstrated its expressiveness with a number of 
examples. Our logic can be seen both as a general proof technique for reasoning 
about concurrent programs, and also as tool for proving the absence of out-of- 
thin-air behaviour for challenging examples, and reasoning about coherence. In 
the future, we would like to extend the logic to cover more of relaxed memory, 
more advanced reasoning principles, such as those available in GPS [26], and 
mechanise its soundness proof. 

Interesting aspects of relaxed memory we would like to also cover are 
read-modify-writes and fences. These would allow us to consider concurrent 
algorithms like circular buffers and the atomic reference counter verified in 
FSL++ [9]. This could be done by adapting the corresponding rules of RSL 
and GPS; moreover, we could adapt them with our new approach to reason 
about coherence. 

To mechanise the soundness proof, we intend to use the Iris framework [11], 
which has already been used to prove the soundness of iGPS [12], a variant of 
the GPS program logic. To do this, however, we have to overcome one technical 
limitation of Iris. Namely, the current version of Iris is step-indexed over N, while 
our semantics uses transfinite step-indexing over N x N to define non-promising 
safety and allow us to reason about certifications of arbitrary length for each 
reduction step. Progress has been made towards transfinitely step-indexed log- 
ical relations that may be applicable to a transfinitely step-indexed version of 
Iris [25]. 
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Abstract. Resource sharing is a fundamental phenomenon in concur- 
rent programming where several threads have permissions to access a 
common resource. Logics for verification need to capture the notion of 
permission ownership and transfer. One typical practice is the use of 
rational numbers in (0, 1] as permissions in which 1 is the full permission 
and the rest are fractional permissions. Rational permissions are not a 
good fit for separation logic because they remove the essential “disjoint- 
ness” feature of the logic itself. We propose a general logic framework 
that supports permission reasoning in separation logic while preserving 
disjointness. Our framework is applicable to sophisticated verification 
tasks such as doing induction over the finiteness of the heap within the 
object logic or carrying out biabductive inference. We can also prove 
precision of recursive predicates within the object logic. We developed 
the Sharelnfer tool to benchmark our techniques. We introduce “scaling 
separation algebras,” a compositional extension of separation algebras, 
to model our logic, and use them to construct a concrete model. 


1 Introduction 


The last 15 years have witnessed great strides in program verification [7,27,39, 
43,44,46]. One major area of focus has been concurrent programs following Con- 
current Separation Logic (CSL) [40]. The key rule of CSL is PARALLEL: 


{Pi} cr {Qi} {P2} co {Q2} 
{Pi * P2} ci||c2 {Q1 * Q2} 


PARALLEL 


In this rule, we write cı||c2 to indicate the parallel execution of commands cı 
and cg. The separating conjunction x indicates that the resources used by the 
threads is disjoint in some useful way, i.e. that there are no dangerous races. 
Many subsequent program logics [18, 20,30,31,45] have introduced increasingly 
sophisticated notions of “resource disjointness” for the PARALLEL rule. 
Fractional permissions (also called “shares” ) are a relatively simple enhance- 
ment to separation logic’s original notion of disjointness [4]. Rather than own- 
ing a resource (e.g. a memory cell) entirely, a thread is permitted to own a 
part/fraction of that resource. The more of a resource a thread owns, the more 
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actions it is permitted to take, a mapping called a policy. In this paper we 
will use the original policy of Bornat [4] to keep the examples straightforward: 
non-zero ownership of a memory cell permits reading while full ownership also 
permits writing. More modern logics allow for a variety of more flexible share 
policies [13,28,42], but our techniques still apply. Fractional permissions are less 
expressive than the “protocol-based” notions of disjointness used in program log- 
ics such as FCSL [38,44], Iris [30], and TaDa [16], but are well-suited for common 
concurrent programming patterns such as read sharing and so have been incor- 
porated into many program logics and verification tools [19,26, 28,31, 36,41]. 
Since fractionals are simpler and more uniform than protocol-based logics, 
they are amenable to automation [26,33]. However, previous techniques had diffi- 
culty with the inductive predicates common in SL proofs. We introduce predicate 
multiplication, a concise method for specifying the fractional sharing of complex 
predicates, writing m - P to indicate that we own the a-share of the arbitrary 
predicate P, e.g. 0.5 - tree(x) indicates a tree rooted at x and we own half of 
each of the nodes in the tree. If set up properly, predicate multiplication handles 
inductive predicates smoothly and is well-suited for automation because: 


Section 3 it distributes with bientailments—e.g. n- (PAQ) JF- (r - P)A (a-Q)— 
enabling rewriting techniques and both forwards and backwards reasoning; 

Section 4 it works smoothly with the inference process of biabduction [10]; and 

Section 5 the side conditions required for bientailments and biabduction can be 
verified directly in the object logic, leveraging existing entailment checkers. 


There has been significant work in recent years on tool support for protocol- 
based approaches [15,19,29,30,48], but they require significant user input and 
provide essentially no inference. Fractional permissions and protocol-based 
approaches are thus complementary: fractionals can handle large amounts of rel- 
atively simple concurrent code with minimal user guidance, while protocol-based 
approaches are useful for reasoning about the implementations of fine-grained 
concurrent data structures whose correctness argument is more sophisticated. 

In addition to Sects. 3, 4 and 5, the rest of this paper is organized as follows. 


Section 2 We give the technical background necessary for our work. 

Section 6 We document Sharelnfer [1], a tool that uses the logical tools developed 
in Sects. 3, 4 and 5 to infer frames and antiframes and check the necessary 
side conditions. We benchmark Sharelnfer with 27 selective examples. 

Section 7 We introduce scaling separation algebra that allows us to construct 
predicate multiplication on an abstract structure in a compositional way. We 
show such model can be constructed from Dockins et al.’s tree shares [21]. 
The key technical proofs in Sects. 5 and 7 have been verified in Coq [1]. 

Section 8 We prove that there are no useful share models that simultaneously 
satisfy disjointness and two distributivity axioms. Consequently, at least one 
axioms has to be removed, which we choose to be the left distributivity. We 
also prove that the failure of two-sided distributivity forces a side condition 
on a key proof rule for predicate multiplication. 

Section 9 We discuss related work before delivering our conclusion. 
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root P3 ( 
0.3 
( 
( 


3, left, right) 


left +> (1,null, grand) 


0.3 


+ + + 


right 4, grand, null) 


grand È$ (1, null, null) 


Fig. 1. This heap satisfies tree(root, 0.3) despite being a DAG 


2 Technical Preliminaries 


Share Models. An (additive) share model (S, ®) is a partial commutative monoid 
with a bottom/empty element € and top/full element F. On the rationals in 
(0, 1], ® is partial addition, £ is 0, and F is 1. We also require the existence of 


complements 7 satisfying 7 6 7 = F; in Q, 7 aS =, 
Separation Logic. Our base separation logic has the following connectives: 


de 


P,Q, etc. & (F) | PAQ| PVQ]|-P | PxQ |Yz.P | 3zx.P | uX.P | e1 5 e2 


Pure facts F are put in angle brackets, e.g. (even(12)). Pure facts force 
the empty heap, i.e. the usual separation logic emp predicate is just a macro 
for (T). Our propositional fragment has (classical) conjunction A, disjunction 
V, negation ~=, and the separating conjunction *. We have both universal V and 
existential 4 quantifiers, which can be impredicative if desired. To construct 
recursive predicates we have the usual Tarski least fixpoint u. The fractional 
points-to e1  e2 means we own the z-fraction of the memory cell pointed to by 
e1, whose contents is e2, and nothing more. To distinguish points-to from emp 
we require that m be non-€. For notational convenience we sometimes elide the 
full share F over a fractional maps-to, writing just e1 + e2. The connection of 
® to the fractional maps-to predicate is given by the bi-entailment: 


MapsTo 
SPLIT 


TI T2 T1072 
e exe e ct e = e Ae =e 


Disjointness. Although intuitive, the rationals are not a good model for shares 
in SL. Consider this definition for 7-fractional trees rooted at x: 


def 


tree(z,7) = (z= null) V d,l,r. x % (d,l,r)xtree(l, m) xtree(r,7) (1) 


This tree predicate is obtained directly from the standard recursive predicate for 
binary trees by asserting only m ownership of the root and recursively doing the 
same for the left and right substructures, and so at first glance looks straight- 
forward!. The problem is that when m € (0,0.5], then tree can describe some 


1 We write x © (v1,...,Un) for x © vi x (x +1) 5 v2x...x (£ +n -— 1) 5 Un. 
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non-tree directed acyclic graphs as in Fig. 1. Fractional trees are a little too easy 
to introduce and thus unexpectedly painful to eliminate. 

To prevent the deformation of recursive structures shown in Fig. 1, we want 
to recover the “disjointness” property of basic SL: e + e xe =— eg Tt L. 
Disjointness can be specified either as an inference rule in separation logic [41] 
or as an algebraic rule on the share model [21] as follows: 


MapsTo 
DISJOINT 


eb exe es tt L Va,b.aĝa= b a=€ (2) 


In other words, a nonempty share 7 cannot join with itself. In Sect.3 we 
will see how disjointness enables the distribution of predicate multiplication over 
x and in Sect. 4 we will see how disjointness enables antiframe inference during 
biabduction. 


Tree Shares. Dockins et al. [21] proposed “tree shares” as a share model satis- 
fying disjointness. For this paper the details of the model are not critical so we 
provide only a brief overview. A tree share r € T is a binary tree with Boolean 
leaves, i.e. T= | © | 7) 7,, where o is the empty share £ and e is the full 
share F. There are two “half” shares: 6 and ¢ o, and four “quarter” shares, 


e.g. o Trees must be in canonical form, i.e., the most compact representation 


under ©: 

MS nn 
~ ~ ~~ a Nm ~ åw g~ 
oo oe ooo eo~ee 17) = rr 


Union U, intersection M, and complement ~ are the basic operations on tree shares; 
they operate leafwise after unfolding the operands under = into the same shape: 


La > Sl > SE 


@0° oeeo e000 oeeo @eeo g Peo 
The structure (T,U,M,*,0,e) forms a countable atomless Boolean algebra and 


thus enjoys decidable existential and first-order theories with precisely known 


complexity bounds [34]. The join operator @ on trees is defined as T1 PT2 = 73 = 


Ti U T2 = T3 A 71172 = o. Due to their good metatheoretic and computational 
properties, a variety of program logics [24,25] and verification tools [3, 26,33, 47] 
have used tree shares (or other isomorphic structures [19]). 


3 Predicate Multiplication 


The additive structure of share models is relatively well-understood [21,33, 34]. 
The focus for this paper is exploring the benefits and consequences of incorporat- 
ing a multiplicative operator ® into a share model. The simplest motivation for 
multiplication is computationally dividing some share m of a resource “in half;” 
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struct tree {int d; struct tree* l; struct tree* r;}; 
void processTree(struct tree* x) { 


wo y H 


if (x == 0) { return; } 
4 print(x -> d); 7 print(x -> d); 
5 processTree(x -> 1); 8 processTree(x -> 1); 
6 processTree(x -> r); 9 processTree(x -> r); 
10 } 


Fig. 2. The parallel processTree function, written in a C-like language 


the two halves of the resource are then given to separate threads for parallel pro- 
cessing. When shares themselves are rationals, ® is just ordinary multiplication, 
e.g. we can divide 0.6 = (0.5@0.6)@(0.5@0.6). Defining a notion of multiplication 
on a share model that satisfies disjointness is somewhat trickier, but we can do 
so with tree shares T as follows. Define 7, ® 72 to be the operation that replaces 
each è in 72 with a copy of 71, e.g: 6e® — = << Xx. The structure 
oel oe! 
(T, $, ®) is a kind of “near-semiring.” The ® operator is associative, has identity 
F and null point £, and is right distributive, i.e. (ab)c = (a®c) G(b@c). It 
is not commutative, does not distribute on the left, or have inverses. It is hard to 
do better: adding axioms like multiplicative inverses forces any model satisfying 
disjointness (Va,b.a®@a=b = a= £E) to have no more than two elements 
(Sect. 8). 

Now consider the toy program in Fig. 2. Starting from the tree rooted at x, 
the program itself is dead simple. First (line 3) we check if the x is null, i.e. if 
we have reached a leaf; if so, we return. If not, we split into parallel threads 
(lines 4-6 and 7-9) that do some processing on the root data in both branches. 
In the toy example, the processing just prints out the root data (lines 4 and 7); 
the print command is unimportant: what is important that we somehow access 
some of the data in the tree. After processing the root, both parallel branches 
call the processTree function recursively on the left x->1 (lines 5 and 8) and 
right x->r (lines 6 and 9) branches, respectively. After both parallel processes 
have terminated, the function returns (line 10). The program is simple, so we 
would like its verification to be equally simple. 

Predicate multiplication is the tool that leads to a simple proof. Specifically, 
we would like to verify that processTree has the specification: 


Yr, x. ( {m-tree(x)} processTree(z) {m - tree(x)} ) 


Here tree(x) df (x = null) V 3d, l, r. x + (d,l, r) x tree(l) x tree(r) is exactly the 
usual definition of binary trees in separation logic. Predicate multiplication has 
allowed us to isolate the fractional ownership from the definition; compare with 
Eq. (1) above. Our precondition and postcondition both say that x is a pointer to 
a heap-represented 7-owned tree. Critically, we want to ensure that our 7-share 
at the end of the program is equal to the z-share at the beginning. This way if 
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our initial caller had full F ownership before calling processTree, he will have 
full ownership afterwards (allowing him to e.g. deallocate the tree). 

The intuition behind the proof is simple. First in line 3, we check if x is null; 
if so we are in the base case of the tree definition and can simply return. If not 
we can eliminate the left disjunct and can proceed to split the *-separated bits 
into disjoint subtrees 1 and r, and then dividing the ownership of those bits 


into two “halves”. Let £ = éo and R = L = ó ə. When we start the parallel 


computation on lines 4 and 7 we want to pass the left branch of the computation 
the £®7-share of the spatial resources, and the right branch of the computation 
the R ® a. In both branches we then need to show that we can read from the 
data cell, which in the simple policy we use for this paper boils down to making 
sure that the product of two non-€ shares cannot be €. This is a basic property 
for reasonable share models with multiplication. In the remainder of the parallel 
code (lines 5-6 and 8-9) we need to make recursive calls, which is done by simply 
instantiating 7 with L 8 a and R @ 7 in the recursive specification (as well as 1 
and r for x). The later half proof after the parallel call is pleasantly symmetric 
to the first half in which we fold back the original tree predicate by merging 
the two halves L & m and R & m back into m. Consequently, we arrive at the 
postcondition 7 - tree(x), which is identical to the precondition. 


3.1 Proof Rules for Predicate Multiplication 


In Fig. 4 we put the formal verification for processTree, which follows the infor- 
mal argument very closely. However, before we go through it, let us consider the 
reason for this alignment: because the key rules for reasoning about predicate 
multiplication are bidirectional. These rules are given in Fig. 3. The non-spatial 
rules are all straightforward and follow the basic pattern that predicate multi- 
plication both pushes into and pulls out of the operators of our logic without 
meaningful side conditions. The DOTPURE rule means that predicate multi- 
plication ignores pure facts, too. Complicating the picture slightly, predicate 
multiplication pushes into implication = but does not pull out of it. Combining 
DOoTIMPL with DOTPURE we get a one-way rule for negation: m - (~P) F a7. 
We will explain why we cannot get both directions in Sects. 5.1 and 8. 


Most of the spatial rules are also simple. Recall that emp df (T), so DOT- 
PURE yields 7-emp JF- emp. The DOTFULL rule says that F is the scalar identity 
on predicates, just as it is the multiplicative identity on the share model itself. 
The DoTDoT rule allows us to “collapse” repeated predicate multiplication 
using share multiplication; we will shortly see how we use it to verify the recur- 
sive calls to processTree. Similarly, the DOTMAPSTO rule shows how predicate 
multiplication combines with basic maps-to by multiplying the associated shares 
together. All three rules are bidirectional and require no side conditions. 

While the last two rules are both bidirectional, they both have side condi- 
tions. The DOTPLUS rule shows how predicate multiplication distributes over ©. 
The F direction does not require a side condition, but the 4 direction we require 
that P be precise in the usual separation logic sense. Precision will be discussed 
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PE Q Dor Dor Dor 
m-PEar-Q °% m+ (P) 4k (P) rune n: (P> Q)F (n-P) => (14-Q) TMPL 


Dot Dor Dor 
T (PAQ) FÆ (mP) m- Q) O m (PVR) (r P)V (rQ) Pot (GP) Fan P N 
T F 0 Dor Dor 
JNIV KIS 
n+ (Veit. P(e)) 4b Var. m- P(e) ia n- (Ax: 7. P(x)) 4b da: 7. w- P(x) = 
Dor Dor aes 
FePAPP m.(m: P) (m1 @m2)-P Oe ey taSsy 
precise(P) Dor PF uniform(z’) Q F uniform(1’) por 
(T1 m2): P AF (m1 - P) x (m2 P) PS m-(PxQ) 4K (r-P)x(m- Q) 0 


Fig. 3. Distributivity of the scaling operator over pure and spatial connectives 


in Sect. 5.2; for now a simple counterexample shows why it is necessary: 


L-(xmaV(r+1) >b) xR. (xmaV(r+1)= b) Y F- (xmavV(xr+1)=> b) 


The premise is also consistent with «5 ax (x +1) Bob. 

The DOTSTAR rule shows how predicate multiplication distributes into and 
out of the separating conjunction x. It is also bidirectional. Crucially, the 4 
direction fails on non-disjoint share models like Q, which is the “deeper 
reason” for the deformation of recursive structures illustrated in Fig. 1. On dis- 
joint share models like T, we get equational reasoning J- subject to the side 
condition of uniformity. Informally, P | uniform(z’) asserts that any heap that 
satisfies P has the permission 7’ uniformly at each of its defined addresses. 
In Sect. 8 we explain why we cannot admit this rule without a side condition. 

In the meantime, let us argue that most predicates used in practice in sep- 
aration logic are uniform. First, every SL predicate defined in non-fractional 
settings, such as tree(x), is F-uniform. Second, P is a 7-uniform predicate if 
and only if m’ - P is (z’ @ m)-uniform. Third, the *-conjunction of two 7-uniform 
predicates is also 7-uniform. Since a significant motivation for predicate multipli- 
cation is to allow standard SL predicates to be used in fractional settings, these 
already cover many common cases in practice. It is useful to consider examples 
of non-uniform predicates for contrast. Here are three (we elide the base cases): 


slist(z) ~J- dd,n.(((d = 17) xx 5 (d,n)) V ((d 417) xx B (d,n))) x slist(n) 
dlist(x) 4k 3d, n.x > d, n x £ - dlist(n) 
dtree(x) J- 3d, l, r.x > d,l,r x L - dtree(l) x R - dtree(r) 


The slist(x) predicate owns different amounts of permissions at different memory 
cells depending on the value of those cells. The dlist(2) predicate owns decreasing 
amounts of the list, e.g. the first cell is owned more than the second, which is 
owned more than the third. The dtree(x) predicate is even stranger, owning 
different amounts of different branches of the tree, essentially depending on the 
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1 void processTree(struct tree* x) { // { 7-tree(x) } 
2 // fr. ((x = mn) V (ad,l,r. x> (d,l,r) x tree(l) x tree(r))) } 
3 // f(x = nul1) V (2d, 1, xt (d,l,r) x (m-tree(l)) * (7 - tree(r))) } 


4 if (x == null) { // {(x = null)} 
5  return;} // { m-tree(x) } 


6 // {x (dlr) x (m-tree(l)) x (r- tree(r)) } 
Fr (x 5 (d,l,r) * (m-tree(l)) * (a: tree(r))) } 
(LOR): (x 5 (d,l,r) x (m-tree(l)) x» (r- tree(r))) } 


{ 
{ 
Be L:(x (d,l,r) x (m-tree(l)) x (m-tree(r)))) x 
R(x (d,l,r) x» (m-tree(l)) * (m- tree(r)) 
10 // fe- (x4 (d,l,r) x (m-tree(l)) x» (m-tree(r )))} 
fc. x5 (d,l,r) x L-m:tree(l) x Ler- tree(r) } 
12 // {x = (d,l,r) x ((L@zm)-tree(1)) x ((L@z)- tree(r)) } 


13 print(x -> d); 
14 processTree(x -> 1); processTree(x -> r); 


is // {x eer 3 (d,l,r) x ((C@n)-tree(l)) x ((L@n)- tree(r)) } 
is // {L-n:x> (d,l,r) x L-n-tree(l) x L-nr- tree(r)} 
ir // {L-T (x= (d,l,r) x tree(l) x tree(r))} 


eH L-T- (x= (dlr) x tree(l) x tree(r))) x 
R-n: (x> (d,l,r) x tree(l) * tree(r)) 

19 // {[(LER): m: (x (dlr) x» tree(l) x tree(r))) } 

20 } // { m-tree(x) } 


Fig. 4. Reasoning with the scaling operator 7 - P. 


path to the root. None of these predicates mix well with DOTSTAR, but perhaps 
they are not useful to verify many programs in practice, either. In Sects. 5.1 and 
5.2 we will discuss how to prove predicates are precise and uniform. In Sect. 5.4 
will demonstrate our techniques to do so by applying them to two examples. 


3.2 Verification of processTree using predicate multiplication 


We now explain how the proof of processTree is carried out in Fig.4 using 
scaling rules in Fig. 3. In line 2, we unfold the definition of predicate tree(x) which 
consists of one base case and one inductive case. We reach line 3 by pushing 7 
inward using various rules DOTPURE, DoTDIsJ, DOTEXIs, DOTMAPSTO and 
DoTSTAR. To use DOTSTAR we must prove that tree(x) is F-uniform, which we 
show how to do in Sect. 5.4. We prove this lemma once and use it many times. 
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The base base x = null is handled in lines 4—5 by applying rule DOTPURE, 
ie., (x = null) F 7- (x = null) and then DoTPOs, m - (x = null) F 7 - tree(x). 
For the inductive case, we first apply DOTFULL in line 7 and then replace F 
with L @ R (recall that R is £’s compliment). On line 9 we use DOTPLUS to 
translate the split on shares with © into a split on heaps with x. 

We show only one parallel process; the other is a mirror image. Line 10 
gives the precondition from the PARALLEL rule, and then in lines 11 and 12 we 
continue to “push in” the predicate multiplication. To verify the code in lines 13- 
14 just requires FRAME. Notice that we need the DOTDOT rule to “collapse” the 
two uses of predicate multiplication into one so that we can apply the recursive 
specification (with the new 7’ in the recursive precondition equal to £ ® 7). 

Having taken the predicate completely apart, it is now necessary to put 
Humpty Dumpty back together again. Here is why it is vital that all of our 
proof rules are bidirectional, without which we would not be able to reach the 
final postcondition 7 - tree(x). The final wrinkle is that for line 19 we must prove 
the precision of the tree(x) predicate. We show how to do so with example in 
Sect.5.4, but typically in a verification this is proved once per predicate as a 
lemma. 


4 Bi-abductive Inference with Fractional Permissions 


Biabduction is a separation logic inference process that helps to increase the 
scalability of verification for sizable programs [22,49]; in recent years it has been 
the focus of substantial research for (sequential) separation logic [8,10, 11,32]. 
Biabduction aims to infer the missing information in an incomplete separation 
logic entailment. More precisely, given an incomplete entailment Ax[??] F Bx[??], 
we would like to find predicates for the two missing pieces [??] that complete 
the entailment in a nontrivial manner. The first piece is called the antiframe 
while the second is the inference frame. The standard approach consists of two 
sequential subroutines, namely the abductive inference and frame inference to 
construct the antiframe and frame respectively. Our task in this section is to 
show how to upgrade these routines to handle fractional permissions so that 
biabduction can extend to concurrent programs. As we will see, disjointness 
plays a crucial role in antiframe inference. 


4.1 Fractional Residue Computation 


Consider the fractional point-to bi-abduction problem with rationals: 
as bx [??] ka bx [??] 


There are three cases to consider, namely mi = m2, 7 < T2 Or T1 > T2. 


In the first case, both the (minimal) antiframe F, and frame Fẹ are emp; for 
the second case we have F, = emp, Fy = a = b and the last case gives 
us Fa = a 3 b, Fy = emp. Here we straightforwardly compute the residue 
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permission using rational subtraction. In general, one can attempt to define 


subtraction © from a share model (S,®) as a © b = c © bOcH=a. However, 


this definition is too coarse as we want subtraction to be a total function so that 
the residue is always computable efficiently. A solution to this issue is to relax 
the requirements for ©, asking only that it satisfies the following two properties: 


C,:a0 (ba) =bO(acb) Cp:a<bOcsach<Kc 
where a <b Œ Jc. a c= b. The condition Cı provides a convenient way to 
compute the fractional residue in both the frame and antiframe while C2 asserts 
that a © b is effectively the minimal element that when joined with b becomes 


greater than a. In the rationals Q, a9 b A if(a > b) then a — b else 0. On 


tree shares T, a © b = anb. Recalling that the case when mı = 72 is simple 


(both the antiframe and frame are just emp), then if mı Æ 72 we can compute 
the fractional antiframe and inference frames uniquely using ©: 


ust T2071 T2 T1OT2 MsuB 
at beat > bF at > beat b 


Generally, the following rule helps compute the residue of predicate P: 


precise( P) 
Ty -Px(mOm):-Pbl 1:-Px*(m1O7m):P 


PSUB 


Using C1 and C% it is easy to prove that the residue is minimal w.r.t. <, i.e.: 


Tı Da =T: Qb > TOn KaAM OT Kb 


4.2 Extension of Predicate Axioms 


To support reasoning over recursive data structure such as lists or trees, the 
assertion language is enriched with the corresponding inductive predicates. To 
derive properties over inductive predicates, verification tools often contain a list 
of predicate axioms/facts and use them to aid the verification process [9,32]. 
These facts are represented as entailment rules A F B that can be classified into 
“folding” and “unfolding” rules to manipulate the representation of inductive 
predicates. For example, some axioms for the tree predicate are: 


F; :x=0^empF tree(x) F>: x (v, £1, £2) x tree(x,) x tree(x2) F tree(x) 
U : tree(x) A x #0 F Iv, x1, £2. £ > (V, £1, £2) * tree(x1) x tree(x2) 


We want to transform these axioms into fractional forms. The key ingredient 
is the DOTPOs rule from Fig. 3, that lifts the fractional portion of an entailment, 
i.e. (PH Q) => (a-P- a-Q). Using this and the other scaling rules from Fig. 3, 
we can upgrade the folding/unfolding rules into corresponding fractional forms: 


Fi:c©=OAempt am -tree(z) Fh: 2 (v,a1, 22) >T: tree(£1) xr- tree(x2) H r- tree(x) 
U : tree(x) A x AOF Jv, z1, £2. £ > (V, £1, £2) x tree(x1) x tree(x2) 
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As our scaling rules are bi-directional, they can be applied both in the 
antecedent and consequent to produce a smooth transformation to fractional 
axioms. Also, recall that our DoTSTAR rule 7-(P*Q) 4+ r- Pxr -Q hasa 
side condition that both P and Q are z’-uniform. This condition is trivial in 
the transformation as standard predicates (i.e. those without permissions) are 
automatically F-uniform. Furthermore, the precision and uniformity properties 
can be transferred directly to fractional forms by the following rules: 


precise(a - P) = precise(P) P  uniform(7) & 7’ + P+ uniform(z’ @ 7) 


4.3 Abductive Inference and Frame Inference 


To construct the antiframe, Calcagno et al. [10] presented a general framework 
for antiframe inference which contains rules of the form: 
A’ *[M"'| > H’ Cond 
Ax[M] > H 


where Cond is the side condition, together with consequents (H, H’), heap formu- 
las (A, A’) and antiframes (M, M’). In principle, the abduction algorithm grad- 
ually matches fragments of consequent with antecedent, derives sound equalities 
among variables while applying various folding and unfolding rules for recursive 
predicates in both sides of the entailment. Ideally, the remaining unmatched frag- 
ments of the antecedent are returned to form the antiframe. During the process, 
certain conditions need to be maintained, e.g., satisfiability of the antecedent or 
minimal choice for antiframe. After finding the antiframe, the inference process 
is invoked to construct the inference frame. In principle, the old antecedent is 
first combined with the antiframe to form a new antecedent whose fragments are 
matched with the consequent. Eventually, the remaining unmatched fragments 
of the antecedent are returned to construct the inference frame. 

The discussion of fractional residue computation in Sect. 4.1 and extension of 
recursive predicate rules in Sect. 4.2 ensure a smooth upgrade of the biabduction 
algorithm to fractional form. We demonstrate this intuition using the example in 
Fig. 5. The partial consequent is a fractional tree(x) predicate with permission 73 
while the partial antecedent is star conjunction of a fractional maps-to predicate 
of address x with permission 71, a fractional tree(x,) predicate with permission 
mq and a null pointer z2. Following the spirit of Calcagno et al. [10], the steps in 
both sub-routines include applying the folding and unfolding rules for predicate 
tree and then matching the corresponding pair of fragments from antecedent and 
consequent. On the other hand, the upgraded part is reflected through the use of 
the two new rules MsuB and PsuB to compute the fractional residues as well as 
a more general system of folding and unfolding rules for predicate tree. We are 


then able to compute the antiframe a = 71 A(13O7)-tree(x1)*2 T> (v, a, v2) 


and the inference frame x 2273s (U, £1, £2) x (T2 © T3) - tree(x1) respectively. 
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x © (v,a, £2) x T2 + tree(x1) * (v2 = 0 A emp) x [??] F 73 - tree(a) x [??] 


BASE 


1 


(£2 = 0 ^ emp) x [emp] > emp 


(a2 = 0 A emp) x [emp] > 73 - tree(x2) * 


PSUB 


T2 + tree(x1) x (a2 = 0 A emp) x [(73 © m2) - tree(x1)] > m3 - tree(a1) x 73 - tree(x2) Ni 
ASUB 


as (v,a, £2) 12 - tree(x1) x (£2 = 0 A emp) x [(T3 © m2) - tree(a1) 


KD EN (v,a, £2) > £ “3, (v,a, £2) x T3 - tree(x1) * 73: tree(w2) MATCH 


+F3 


x Hs (v,a, £2) x T2- tree(a1) x (£2 = OA emp) 


x [a = xı A (T3 © T2) tree(x1) x £ pe (v,a, £2)] & 73 - tree(x) 


Abductive inference 


BASE 
emp > emp x [emp] 


MSUB 
7™10(73071) (71073) 
T oeae ( 


(v, z1, £2) bores (v, £1, £2) * [ v, £1, £2)] 


eee PSUB 
r TL, (v, £1, £2) * (12 ® (m3 © T2)) - tree(a1) > 


aes (v, £1, £2) x T3 + tree(x1) x [x (v, £1, £2) * (T2 © T3) - tree(x1)] 


Fi 
g EGET, (y w1, 22) * (T2 D (T3 © T2)) - tree(a1) * (£2 = 0 A emp) > 
x 3; (v, £1, £2) x T3 + tree(a1) x 73 - tree(x2) x [x mOr, (v, £1, £2) * (T2 © T3) - tree(x1)] 


Frame inference 


Fig. 5. An example of biabduction with fractional permissions 


Antiframe Inference and Disjointness. Consider the following abduction 
problem: 


xz (v, £1, £2) * tree(x1) x [??] F tree(x) 


Using the folding rule F2, we can identify the antiframe as tree(x2). Now suppose 
we have a rational permission m € Q distributed everywhere, i.e.: 


x5 (v, £1, £2) x T - tree(x1) x [??] F r - tree(x) 


A naive solution is to let the antiframe be r - tree(x2). However, in Q this choice 
is unsound due to the deformation of recursive structures issue illustrated in 
Fig. 1: if the antiframe is 7 - tree(x2), the left hand side can be a DAG, even 
though the right hand side must be a tree. However, in disjoint share models 
like T, choosing 7 - tree(x2) for the antiframe is correct and the entailment holds. 
As is often the case, things are straightforward once the definitions are correct. 


5 A Proof Theory for Fractional Permissions 


Our main objective in this section is to show how to discharge the uniformity 
and precision side conditions required by the DOTSTAR and DOTPLUS rules. 
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To handle recursive predicates like tree(x) we develop set of novel modal-logic 
based proof rules to carry out induction in the heap. To allow tools to leverage 
existing entailment checkers, all of these techniques are done in the object 
logic itself, rather than in the metalogic. Thus, in Sect.5, we do not assume a 
concrete model for our object logic (in Sect. 7 we will develop a model). 

First we discuss new proof rules for predicate multiplication and fractional 
maps-to (Sect.5.1), precision (Sect.5.2), and induction over fractional heaps 
(Sect. 5.3). We then conclude (Sect. 5.4) with two examples of proving real prop- 
erties using our proof theory: that tree(x) is F-uniform and that list(a) is precise. 
Some of the theorems have delicate proofs, so all of them have been verified in 
Coq [1]. 


5.1 Proof Theory for Predicate Multiplication and Fractional 
Maps-To 


In Sect. 3 we presented the key rules that someone who wants to verify programs 
using predicate multiplication is likely to find convenient. On page 13 we present 
a series of additional rules, mostly used to establish the “uniform” and “precise” 
side conditions necessary in our proofs. 

Figure 6 is the simplest group, giving basic facts about the fractional points- 
to predicate. Only + INVERSION is not immediate from the nonfractional case. 
It says that it is impossible to have two fractional maps-tos of the same address 
and with two different values. We need this fact to e.g. prove that predicates 
with existentials such as tree are precise. 


— 
INVERSION 


(x5 yr xT) A (ar yoxT) H [yı =y2| x y H memp mee yF |z A nullj 


= =œ 


null 


Fig. 6. Proof theory for fractional maps-to 


oT uniform/em 7 7 = uniform» 
emp F uniform(7) 5 uniform(7) x uniform(r) J- uniform(7) 


P F uniform(z) 


— 


7 7 7 uniformDOT ~; g < PReOWe 
T- P H uniform(x’ @ 7) precise(a +> y 
> precise(P) bor 
T . REET F PRECISE 
ary uniform(r) precise(z - P) 


Fig. 7. Uniformity and precision for predicate multiplication 
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GF precisely(P) GF precisely(Q) TE precisely(P) cisely 
precisely * _——————— 


GF precisely(P x Q) ~~ precise(P) PRP? 


sath 30.(G H precisely (P(x) 


LEFT precisely Y 


GF precisely (Yx. P(£)) 


precisely(P) ((P*Q)A(PxR)) =(P*(Q^R)) 


precisaly GF precisely(P) 


RIGHT precisely A 


VQ, R. (cr ((PxQ)A(PR)) =(Px(QAR))) 
) 


GE precisely (P GFE precisely(P ^A Q) 
va.(G H precisely (P(z)) ) GF precisely(P) 
GF precisely(Q) 
Ve, y.(GA (P(z) *T) A (Plu) +T) E |e = yl) GA(P*T)A(QxT)FL 
recisely3 recisel 
GF precisely (3x. P(£)) a GF precisely(P V Q) ee 


Fig. 8. Proof theory for precision 


T ©© baba 
©PF P ©P F ©0 P Dr P F bebe P 
BP EP 
; rO ————___—_. Où 
TEP Dr P AF Dor ©P Dr P +F- ©bx P 


N P H U(r) ^A~emp 
(PxQ)AORF (PAOR) (QAOR) (PQ) ABs RE (PADaR) *(QAR) 


Dak 


Fig. 9. Proof theory for substructural induction 


Proving the side conditions for DOTPLUS and DOTSTAR. Figure 7 contains some 
rules for establishing that P is 7-uniform (i.e. P | uniform(z)) and that P is 
precise. Since uniformity is a simple property, the rules are easy to state: 

To use predicate multiplication we will need to prove two kinds of side con- 
ditions: uniform/emp tells us that emp is 7-uniform for all 7; the conclusion (all 
defined heap locations are held with share 7) is vacuously true. The uniformDoT 
rule tells us that if P is 7-uniform then when we multiply P by a fraction 7’ the 
result is (7’@ 7)-uniform. The +> uniform rule tells us that points-to is uniform. 
The uniformx rule possesses interesting characteristics. The 4 direction follows 
from uniform/emp and the xemp rule (P x emp J- P). The F direction is not 
automatic but very useful. One consequence is that from P F uniform(7) and 
Q F uniform(7) we can prove P x Q F uniform(7). The + direction follows from 
disjointness but fails over non-disjoint models such as rationals Q. 

The + PRECISE rule tells us that points-tos are precise. The DOTPRECISE 
rule is a partial solution to proving precision. It states that 7- P is precise if and 
only if P is precise. We will next show how to prove that P itself is precise. 
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5.2 Proof Theory for Proving that Predicates Are Precise 


Proving that a predicate is 7-uniform is relatively straightforward using the proof 
rules presented so far. However, proving that a predicate is precise is not as 
pleasant. Traditionally precision is defined (and checked for concrete predicates) 
in the metalogic [40] using the following definition: 


precise(P) “ Wh, hy, ho. hi Ch => hy Ch > (h EP) > (ha EP) Shi =h2 (3) 


Here we write hı C hg to mean that hı is a subheap of hg, i.e. Sh’.hy Gh! = 
hə, where © is the joining operation on the underlying separation algebra [21]. 
Essentially precision is a kind of uniqueness property: if a predicate P is precise 
then it can only be true on a single subheap. 

Rather than checking precision in the metalogic, we wish to do so in the object 
logic. We give a proof theory that lets us do so in Fig. 8. Among other advantages, 
proving precision in the object logic lets tools build on existing separation logic 
entailment checkers to prove the precision of recursive predicates. The core idea 
is simple: we define a new object logic operator “precisely(P)” that captures the 
notion of precision relativized to the current heap; essentially it is a partially 
applied version of the definition of precise(P) in Eq. (3): 


h | precisely(P)  Vhi, ho-hi Ch > ho C h > (hi = P) > (ho EP) Shi =h (4) 


Although we have given precisely’s model to aid intuition, we emphasize that in 
Sect. 5 all of our proofs take place in the object logic; we never unfold precisely’s 
definition. Note that precisely is also generally weaker than the typical notion of 
precision. For example, the predicate x + 7 V y > 7 is not precise; however the 
entailment z +> 8 F precisely(x ++ 7 V y +> 7) is provable from Fig. 8. 

That said, two notions are closely connected as given in the precisely PRECISE 
rule. We also give introduction preciselyYRIGHT and elimination rules 
preciselyLEFT that make a connection between precision and an “antidistribu- 
tion” of x over ^. 

We also give a number of rules for showing how precisely combines with 
the connectives of our logic. The rules for propositional ^A and separating x 
conjunction follow well-understood patterns, with the addition of an arbitrary 
premise context G being the key feature. The rule for disjunction V is a little 
trickier, with an additional premise that forces the disjunction to be exclusive 
rather than inclusive. An example of such an exclusive disjunction is in the 
standard definition of the tree predicate, where the first disjunct (x = null) 
is fundamentally incompatible with the second disjunct dd,l,r.c > d,l, r x... 
since + does not allow the address to be null (by rule + null from Fig. 6). 
The rules for universal quantification V existential quantification J are essentially 
generalizations of the rules for the traditional conjunction A and disjunction V. 

It is now straightforward to prove the precision of simple predicates such as 
(x = null) V (3y.x + yxy + 0). Finding and proving the key lemmas that 
enable the proof of the precision of recursive predicates remains a little subtle. 
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5.3 Proof Theory for Induction over the Finiteness of the Heap 


Recursive predicates such as list(x) and tree(z) are common in SL. However, 
proving properties of such predicates, such as proving that list(a) is precise, 
is a little tricky since the »FOLDUNFOLD rule provided by the Tarski fixed 
point does not automatically provide an induction principle. Generally speaking 
such properties follow by some kind of induction argument, either over auxiliary 
parameters (e.g. if we augment trees to have the form tree(x,7), where 7 is an 
inductively-defined type in the metalogic) or over the finiteness of the heap itself. 
Both arguments usually occur in the metalogic rather than the object logic. 
We have two contributions to make for proving inductive properties. First, 
we show how to do induction over the heap in a fractional setting. Intuitively this 
is more complicated than in the non-fractional case because there are infinite 
sequences of strictly smaller subheaps. That is, for a given initial heap ho, there 
are infinite sequences h1, hg, ...such that ho 2 hy 2 hg D .... The disjointness 
property does not fundamentally change this issue, so we illustrate with an 


example with the shares in Q. The heap ho satisfying x a y is strictly larger 
1 
than the heap hı satisfying x + y, which is strictly larger than the heap hz 
1 1 


satisfying x + y; in general h; satisfies x Š y. Since our sequence is infinite, we 
cannot use it as the basis for an induction argument. The solution is that we 
require that the heaps decrease by at least some constant size c. If each heap 
subsequent heap must shrink by at least e.g. c = 0.25 of a memory cell then 
the sequence must be finite just as in the non-fractional case, i.e. c= F. More 
sophisticated approaches are conceivable (e.g. limits) but they are not easy to 
automate and we did not find any practical examples that require such methods. 

Our second contribution is the development of a proof theory in the object 
logic that can carry out these kinds of induction proofs in a relatively straightfor- 
ward way. The proof rules that let us do so are given in Fig. 9. Once good lemmas 
are identified, we find doing induction proofs over the finite heap formally in the 
object logic simpler than doing the same proofs in the metalogic. 

The key to our induction rules is two new operators: “within” © and “shrink- 
ing” >. Essentially >,P is used as an induction guard, preventing us from 
applying our induction hypothesis P until we are on a z-smaller subheap. When 
nm = F we sometimes write just >P. Semantically, if h satisfies >,P then P is 
true on all strict subheaps of h that are smaller by at least a 7-piece. 
Accordingly, the key elimination rule >,* may seem natural: it verifies that the 
induction guard is satisfied and unlocks the underlying hypothesis. To start an 
induction proof to prove an arbitrary goal T = P, we use the rule W to intro- 
duce an induction hypothesis, resulting in the new entailment goal of >, P F P. 

Some definitions, such as list(a), have only one “recursive call”; others, such 
as tree(x) have more than one. Moreover, sometimes we wish to apply our induc- 
tive hypothesis immediately after satisfying the guard, whereas other times it is 
convenient to satisfy the guard somewhat before we need the inductive hypoth- 
esis. To handle both of these issues we use the “within” operator © such that 
h = ©P means P is true on all subheaps of h, which is the intuition behind the 
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rule ©x. To apply our induction hypothesis somewhat after meeting its guard 
(or if we wish to apply it more than once) we use the >,© rule to add the © 
modality before eliminating the guard. We will see an example of this shortly. 


5.4 Using Our Proof Theory 


We now turn to two examples of using our proof theory from page 13 to demon- 
strate that the rule set is strong and flexible enough to prove real properties. 


Proving that tree(x) is F-uniform. Our logical rules for induction and uniformity 
are able to establish the uniformity of predicates in a fairly simple way. Here we 
focus on the tree(x) predicate because it is a little harder due to the two recursive 
“calls” in its unfolding. For convenience, we will write u(r) instead of uniform(z). 

Our initial proof goal is tree(x) F u(F). Standard natural deduction argu- 
ments then reach the goal T | Va.tree(x) = u(F), after which we apply the 
W rule (m = F is convenient) to start the induction, adding the hypoth- 
esis >Vax.tree(x) = u(F), which we strengthen with the >,© rule to reach 
> © Va.tree(x) > u(F). Natural deduction from there reaches 


((x = null) V 3d, l, r.x + (d, l, r) x tree(1) x tree(r)) A (> @Vzx.tree(x) = u(F)) H u(F) 


The proof breaks into two cases. The first reduces to (x = null) A(>---) F u(F), 
which follows from uniform/emp rule. The second case reduces to (x > (d, l, r) x 
tree(1) xtree(r)) A (> @Vx.tree(x) = u(F)) F u(F). Then the uniform» rule gives 


(a ++ (d,l,r) x (tree(l) x tree(r))) A ( > @Vax.tree(x) => u(F))  u(F) x u(F) 


We now can cut with the >,* rule to meet the inductive guard since x > 
(d,l,r) F uniform(F) A-emp due to the rules + uniform and + emp. Our remain- 
ing goal is thus 


(a (d,l,r) A+++) x ((tree(l) * tree(r)) A @Vx.tree(z) = u(F))  u(F) x u(F) 


We split over x. The first goal is x +> (d,l,r) ^A D>- -- F u(F), which follows from 
m u. The second goal is (tree(/) * tree(r)) A ©Vx.tree(x) = u(F)) F u(F). We 
apply ©x to distribute the inductive hypothesis into the x, and uniforms to split 
the right hand side, yielding 


(tree(l) A OVa.tree(x) = u(F)) «(tree(r) A @Vx.tree(x) = u(F))Fu(F) xu(F) 


We again split over x to reach two essentially identical cases. We apply rule T 
to remove the © and then reach e.g. Vx.tree(x) = u(F) F tree(1) > u(F), which 
is immediate. Further details on this proof can be found in the full paper [2]. 


Proving that list(x) is precise. Precision is more complex than 7-uniformity, so it 
is harder to prove. We will use the simpler list(a) as an example; the additional 
trick we need to prove that tree(x) is precise are applications of the >,© and 
©x rules in the same manner as the proof that tree(x) is F-uniform. We have 
proved that both list(x) and tree(x) are precise using our proof rules in Coq [1]. 
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precise(P) 


precisely(P) 4 (P x T) = precisely(P) (A) P x precisely(Q) + precisely(P x Q) (D) 
Q A (Rx T) F precisely(R) Væ. (Q A (P(x) x T) F precisely (P(x) ) 
x T)F isely( S 
A A N 


B 
QA ((RV 8) *T) F precisely (R v 5) (B) QA (Gr.P() x T) H precisely (3x. P (x) ) 


Fig. 10. Key lemmas we use to prove recursive predicates precise 


In Fig. 10 we give four key lemmas used in our proof?. All four are derived 
(with a little cleverness) from the proof rules given in Fig.8. We sketch the 
proof as follows. To prove precise(list(z)) we first use the preciselyPRECISE rule 
to transform the goal into T F precisely(list(x)). We cannot immediately apply 
rule W, however, since without a concrete *-separated conjunct outside the 
precisely, we cannot dismiss the inductive guard with the >, rule. Accordingly, 
we next use lemma (A) and standard natural deduction to reach the goal T + 
Va.(list(x) x T) = precisely(list(x)), after which we apply rule W with a = F. 

Afterwards we do some standard natural deduction steps yielding the goal 


(> Vax. (list(x) x T) => precisely (list(æ)) ) A (((e = null) V Jd, n.x > (d,n) x list(n)) x T) H 
precisely ((a = null) V dd, n.x + (d, n) x list(n)) 


We are now in a position to apply lemma (B) to break up the conjunction. We 
now have three goals. The first goal is that (x = null) is precise, which follows 
from the fact that emp is precise, which in turn can be proved using the rule 
precisely RIGHT. The third goal is that the two branches of the disjunction are 
mutually incompatible, which follows from (x = null) being incompatible with 
maps-to using rule ++null. The second (and last remaining) goal needs to use 
lemma (C) twice to break up the existentials. Two of the three new goals are 
to show that the two existentials are uniquely determined, which follow from 
++ INVERSION, leaving the goal 


(> Vax. (list(a)*T) => precisely list(x))) A (c+ (d, n)x(list(n)xT)) H precisely (e= (d,n) xlist(n)) 


We now cut with lemma (D), using rule +> PRECISE to prove its premise, yielding 


(> Vax. (list(a)*T) => precisely (list())) A (z= (d, n)x(list(n)xT)) H xe (d,n) «precisely ist(n)) 
We now use >, rule to defeat the inductive guard. The rest is straightforward. 


Further details on this proof can be found in the full paper [2]. 


6 The Sharelnfer fractional biabduction engine 


Having described our logical machinery in Sects. 3, 4 and 5, we now demonstrate 
that our techniques are well-suited to automation by documenting our Sharelnfer 


? We abuse notation by reusing the inference rule format to present derived lemmas. 
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Precision Uniformity Bi-abduction 
File name Time (ms) | Filename | Time (ms) | File name | Time (ms) 
precise_map1 0.1 uni_mapl 0.2 bi_mapl 1.3 
precise_map2 0.2 uni_map2 0.8 bi_map2 0.9 
precise_map3 1:2 uni-map3 0.3 bi-map3 0.5 
precise_list1 2.7 uni_list1 1.2 bi_list1 4.0 
precise_list2 1.3 uni_list2 2.1 bi_list2 3:2 
precise_list3 3.4 uni_list3 0.7 bi_list3 3.8 
precise_tree1 1.4 uni_treel 1.9 bi_treel 5.1 
precise_tree2 1.7 uni_tree2 1.0 bi_tree2 6.5 
precise_tree3 12.2 uni_tree3 10.3 bi_tree3 7.9 


Fig. 11. Evaluation of our proof systems using Sharelnfer 


prototype [1]. Our tool is capable of checking whether a user-defined recursive 
predicate such as list or tree is uniform and/or precise and then conducting biab- 
ductive inference over a separation logic entailment containing said predicates. 

To check uniformity, the tool first uses heuristics to guess a potential tree 
share candidate a and then applies proof rules in Figs.7 and 6 to derive the 
goal uniform(7). To support more flexibility, our tool also allows users to specify 
the candidate share 7 manually. To check precision, the tool maneuvers over the 
proof rules in Figs.6 and 8 to achieve the desired goal. In both cases, recursive 
predicates are handled with the rules in Fig. 9. Sharelnfer returns either Yes, No 
or Unknown together with a human-readable proof of its claim. 

For bi-abduction, Sharelnfer automatically checks precision and uniformity 
whenever it encounters a new recursive predicate. If the check returns Yes, 
the tool will unlock the corresponding rule, i.e., DOTPLUuS for precision and 
DOTSTAR for uniformity. Sharelnfer then matches fragments between the con- 
sequent and antecedent while applying folding and unfolding rules for recur- 
sive predicates to construct the antiframe and inference frame respectively. For 
instance, here is the biabduction problem contained in file bi_tree2 (see Fig. 11): 


a (b,c,d) x L-tree(c) x R-tree(d) * [27] H L-tree(a) x [??] 


Sharelnfer returns antiframe £-tree(d) and inference frame ares (b, c, d)xR-tree(d). 

Sharelnfer is around 2.5k LOC of Java. We benchmarked it with 27 selec- 
tive examples from three categories: precision, uniformity and bi-abduction. The 
benchmark was conducted with a 3.4 GHz processor and 16 GB of memory. Our 
results are given in Fig. 11. Despite the complexity of our proof rules our perfor- 
mance is reasonable: Sharelnfer only took 75.9 ms to run the entire example set, or 
around 2.8 ms per example. Our benchmark is small, but this performance indi- 
cates that more sophisticated separation logic verifiers such as HIP/SLEEK [14] 
or Infer [9] may be able to use our techniques at scale. 


404 X.-B. Le and A. Hobor 


7 Building a Model for Our Logic 


Our task now is to provide a model for our proof theories. We present our mod- 
els in several parts. In Sect.7.1 we begin with a brief review of Cancellative 
Separation Algebras (CSA). In Sect. 7.2 we explain what we need from our frac- 
tional share models. In Sect. 7.3 we develop an extension to CSAs called “Scaling 
Separation Algebras” (SSA). In Sect. 7.5 we develop the machinery necessary to 
support our rules for object-level induction over the heap. We have verified in 
Coq [1] that the models in Sect. 7.1 support the rules in Fig. 8, the models in 
Sect. 7.3 support the rules Figs. 3 and 7, and the models in Sect. 7.5 support the 
rules in Fig. 9. 


7.1 Cancellative Separation Algebras 


A Separation Algebra (SA) is a set H with an associative, commutative partial 
operation ®. Separation algebras can have a single unit or multiple units; we 
use identity(x) to indicate that x is a unit. A Cancellative SA (H,@®) further 


requires that a ® bı = c => a @ bo = c > bı = b2. We can define a partial order 


on H using ® by hı C ho © Ih'.hı @h! = hz. Calcagno et al. [12] showed that 


CSAs can model separation logic with the definitions 


h H PxQ & Shi, ho. hi@ho=hA (hı = P) A (ho =Q) and h Kemp © identity(h). 


The standard definition of precise( P) was given as Eq. (3) in Sect. 5.2, together 
with the definition for our new precisely(P) operator in Eq. (4). What is difficult 
here is finding a set of axioms (Fig. 8) and derivable lemmas (e.g. Fig. 10) that are 
strong enough to be useful in the object-level inductive proofs. Once the axioms 
are found, proving them from the model given is straightforward. Cancellation 
is not necessary to model basic separation logic [18], but we need it to prove 
the introduction precisely RIGHT and elimination rules precisely LEFT for our new 
operator. 


7.2 Fractional Share Algebras 


A fractional share algebra (S, 8,89, E, F} (FSA) is a set S with two operations: 
partial addition © and total multiplication ®. The substructure (S, ®) is a CSA 
with the single unit €. For the reasons discussed in Sect. 2 we require that ® 
satisfies the disjointness axiom a@a = b > a = E. Furthermore, we require that 
the existence of a top element F, representing complete ownership, and assume 
that each element s € S has a complement 5 such that s 635 = F. 

Often (e.g. in the fractional +» operator) we wish to restrict ourselves to 


the “positive shares” S+ © g \ {E}. To emphasize that a share is positive we 
often use the metavariable 7 rather than s. @ is still associative, commutative, 
and cancellative; every element other than F still has a complement. To enjoy 
a partial order on S+ and other SA- or CSA-like structures that lack identities 
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. ee def ,~ 
(sometimes called “permission algebras”) we define mı C mg = (An'.11 ® T = 


T2) V (mı = T2). 

For the multiplicative structure we require that (S, 89, F} be a monoid, i.e. 
that ® is associative and has identity F. Since we restrict maps-tos and the per- 
mission scaling operator to be positive, we want (S+, 89, F) to be a submonoid. 
Accordingly, when {71,72} C St, we require that 71 @72 Æ E. Finally, we require 
that © distributes over ® on the right, that is (s1®s2)8s3 = (s1883) (s2883); 
and that ® is cancellative on the right given a positive left multiplicand, i.e. 
T Q S1 =T Q S2 => 81 = S2. 

The tree share model we present in Sect. 2 satisfies all of the above axioms, so 
we have a nontrivial model. As we will see shortly, it would be very convenient if 
we could assume that ® also distributed on the left, or if we had multiplicative 
inverses on the left rather than merely cancellation on the right. However, we 
will see in Sect. 8.2 that both assumptions are untenable. 


7.3 Scaling Separation Algebra 


A scaling separation algebra (SSA) is (H, S,@H,®s,@s,€,F, mul, force), where 
(H, ®) isa CSA for heaps and (S, @g, @g, €, F) isa FSA for shares. Intuitively, 
mul(a, hı) multiplies every share inside hı by m and returns the result hg. The 
multiplication is on the left, so for each original share 7’ in hı, the resulting 
share in hg is 7 @g 7’. Recall that the informal meaning of 7 - P is that we have 
a 7-fraction of predicate P. Formally this notion relies on a little trick: 


her P = JV. mull’) =TAn 


= P 


(5) 


A heap h contains a 7-fraction of P if there is a bigger heap h’ satisfying P, and 
multiplying that bigger heap h’ by the scalar 7 gets back to the smaller heap h. 

The simpler force(r, hı) overwrites all shares in hı with the constant share 7 
to reach the resulting heap h2. We use force to define the uniform predicate as 
h } uniform(r) E force(a,h) = h. A heap h is m-uniform when setting all the 
shares in h to m gets you back to h—i.e., they must have been 7 to begin with. 


Sı. force(r, force(n’,a)) = force(z, a) So. force(r, mul(x’,a)) = force(m, a) 

S3. mul(n, force(n’,a)) = force(n s m’,a) | S4. mul(a,mul(x’,a)) = mul(a 8s 7’, a) 

Ss. identity(a) = force(7,a) =a Se. a Cy force(F,a) 

S7. nı Cs T2 > force(m,a) Cy force(m2,a) | Ss. force(m,a)@n force(m, b) =c => force(m, c)=c 
So. identity(a) > mul(a,a) = a Sio. mul(F,a) =a 

Sir. mul(m, a1) = mul(7, a2) > a1 = a2 Si2. mul(7,a) CH a 

S13. 71 Os T2 = T3 > Vb, c.((mul (m1, b) Bu mul(T2, b) = c) 5 (c = mul (73, b)) ) 

Si4. force(n’,a) Bu force(n’,b) = force(n’,c) = 


mul (1, force(x’,a)) Du mul(x, force(x’,b)) = mul(x, force(x",c)) 


Fig. 12. The 14 additional axioms for scaling separation algebras beyond those inher- 
ited from cancellative separation algebras 
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We need to understand how all of the ingredients in an SSA relate to each 
other to prove the core logical rules on page 13. We distill the various relation- 
ships we need to model our logic in Fig. 12. Although there are a goodly number 
of them, most are reasonably intuitive. 

Axioms Sı through S4 describe how force and mul compose with each other. 
Axioms S5, S9, and S10 give conditions when force and mul are identity func- 
tions: when either is applied to empty heaps, and when mul is applied to the 
multiplicative identity on shares F. Axioms Sg and Si2 relate heap order with 
forcing the full share F and multiplication by an arbitrary share m. Axiom $7 
says that force is order-preserving. Axiom Sg is how the disjointness axiom on 
shares is expressed on heaps: when two z-uniform heaps are joined, the result is 
m-uniform. Axiom 51; says that mul is injective on heaps. Axiom S13 is delicate. 
In the > direction, it states that mul preserves the share model’s join structure 
on heaps. In the < direction, S13 is similar to axiom Sg, saying that the share 
model’s join structure must be preserved. Taking both directions together, S13 
translates the right distribution property of s over &s into heaps. The final 
axiom $4 is a bit of a compromise. We wish we could satisfy 


Ije a®yb=c & mul(x,a) Oy mul(x, b) = mul(z,c) 
14 is a kind of dual for S13, i.e. it would correspond to a left distributivity 
property of @g over s in the share model into heaps. Unfortunately, as we 
will see in Sect. 8.2, the disjointness of ®s is incompatible with simultaneously 
supporting both left and right distributivity. Accordingly, S14 weakens S{, so 
that it only holds when a and b are z’-uniform (which by Sg forces c to be 
m’-uniform). We also wish we could satisfy S15: Vr, a.db.mul(z,b) = a, which 
corresponds to left multiplicative inverses, but again (Sect. 8.2) disjointness is 
incompatible. 


7.4 Compositionality of Scaling Separation Algebras 


Despite their complex axiomatization, we gain two advantages from developing 
SSAs rather than directly proving our logical axioms on a concrete model. First, 
they give us a precise understanding of exactly which operations and proper- 
ties (S;—S4) are used to prove the logical axioms. Second, following Dockins 
et al. [21] we can build up large SSAs compositionally from smaller SSAs. 

To do so cleanly it will be convenient to consider a slight variant of SSAs, 
“Weak SSAs” that allow, but do not require, the existence of identity elements 
in the underlying CSA model. A WSSA satisfies exactly the same axioms as an 


SSA, except that we use the weaker Cy definition we defined for permission 


algebras, i.e. a, Cy a2 = (da’.a; Du a’ = ag) V (a1 = a2). Note that S5 and 


Sg are vacuously true when the CSA does not have identity elements. We need 
identity elements to prove the logical axioms from the model; we only use WSSAs 
to gain compositionality as we construct a suitable final SSA. Keeping the share 
components (S,@g,@gs,€,F) constant, we give three SSA constructors to get a 
flavor for what we can do with the remaining components (H, ®x, force, mul). 
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Example 1 (Shares). The share model (S, s) is an SSA, and the positive (non- 
E) shares (St, $) area WSSA, with forceg(z, 7’) f t and muls(t, T") L ter. 


Example 2 (Semiproduct). Let (A, $4, force 4, mula) be an anal and B 
be a set. Define (ai, b1) AxB (a2, b2) = (a3, bs) de ay DA a2 = az ^A bı = 


bo = b3, force ay B(T, (a, b)) = (force (1, a), b), and mulaxB(T, (a, b)) eg 
(mul a(m,a),b). Then (A x B, @axs, force yy, p, mul4xpB) is an SSA/WSSA. 


Example 8 (Finite partial map). Let A be a set and (B, @®pg, force p, mulg) be 
an SSA/WSSA. Define f ® |í p 9 = h pointwise [21]. Define force m p(T, f) = = 


Se A )) and ibe define mul fin p(T P) £ f \z.mulp(r, f(x )). The 


structure (A Hi B, Dyfr p FOTCE, fa p , MUl frp) is an SSA. 

Using these constructors, A i (S*,V), i.e. finite partial maps from addresses 
to pairs of positive shares and values, is an SSA and thus can support a model 
for our logic. We also support other standard constructions e.g. sum types +. 


7.5 Model for Inductive Logic 


What remains is to give the model that yields the inductive logic in Fig. 9. The 
key induction guard modal >, operator is defined as follows: 


hy Sr ha def dho,h3. hy Dy ho A h3 Qu ha = h2 A (h3 = uniform(7) A memp) 


hH brP Œ Yh. (h Sp h) > (h E P) 


In other words, >, is a (boxy) modal operator over the relation Sr, which relates 
a heap hı with all heaps that are strict subheaps that are smaller by at least 
a m-piece. The model is a little subtle to enable the rules >,© and ©>, that 
let us handle multiple recursive calls and simplify the engineering. The within 
operator © is much simpler to model: 


h Wh © hy Da he hE oP © WW. (hW k) > (k H P) 


All of the rules in Fig. 9 follow from these definitions except for rule W. To 
prove this rule, we require that the heap model have an additional operator. The 
“r-quantum”, written |h|,, gives the number of times a non-empty 7-sized piece 
can be taken out of h. For disjoint shares, the number of times is no more than 
the number of defined memory locations in h. We require two facts for |h|,,. First, 
that hı Cy ho => |hilr < |helz, i.e. that subheaps do not have larger 7-quanta 
than their parent. Second, that hi Py h2 = hz > (h2 | uniform(r) A semp) > 
[hsl > |hilx, ie. that taking out a a-piece strictly decreases the number of 
m-quanta. Given this setup, rule W follows immediately by induction on |A|,. 
The rules that require the longest proofs in the model are >,© and ©>,. 
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8 Lower Bounds on Predicate Multiplication 


In Sect. 7 we gave a model for the logical axioms we presented in Fig.3 and on 
page 13. Our goal here is to show that it is difficult to do better, e.g. by having 
a premise-free DOTSTAR rules or a bidirectional DOTIMPL rule. In Sect.8.1 we 
show that these logical rules force properties on the share model. In Sect. 8.2 
we show that disjointness puts restrictions on the class of share models. There 
are no non-trivial models that have left inverses or satisfy both left and right 
distributivity. 


8.1 Predicate Multiplication’s Axioms Force Share Model 
Properties 


The SSA structures we gave in Sect. 7.3 are good for building models that enable 
the rules for predicate multiplication from Fig.3. However, since they impose 
intermediate algebraic and logical signatures between the concrete model and 
rules for predicate multiplication, they are not good for showing that we cannot 
do better. Accordingly here we disintermediate and focus on the concrete model 
AS (S*,V), that is finite partial maps from addresses to pairs of positive 
shares and values. The join operations on heaps operates pointwise [21], with 
(771,01) ® (m2, v2) = (T3, U3) = Tı BS T2 = 73 A V1 = V2 = v3, from which we 
derive the usual SA model for x and emp (Sect.7.1). We define h H a4 y = 
dom(h) = {x} A h(x) = (m,y). We define scalar multiplication over heaps & y 


pointwise as well, with 7 ® (12, v) df (T1 Qs T2, v), and then define predicate 


multiplication by h = m. P def IW hl =r Qg h' =h^AR H P. All of the 


above definitions are standard except for &g, which strikes us as the only choice 
(up to commutativity), and predicate multiplication itself. 

By Sect. 7 we already know that this model satisfies the rules for predicate 
multiplication, given the assumptions on the share model from Sect. 7.2. What 
is interesting is that we can prove the other direction: if we assume that the 
key logical rules from Fig.3 hold, they force axioms on the share model. The 
key correspondences are: DOTFULL forces that F is the left identity of ®g; 
DotTMapsTo forces that F is the right identity of ®3; DOTMAPSTo forces the 
associativity of ®g; the 4 direction of DOTCONJ forces the right cancellativity 
of @g (as does DOTIMPL and the 4 direction of DOTUNIv); and DoTPLUs, 
which forces right distributivity of ®g over @g. 

The following rules force left distributivity of &s over @g and left ®g 
inverses: 


DoT Dot 


m-(P*Q) dE (a: P) *(7-Q) Star’ n-(P>Q)4(x-P) => (x-Q) Imp! 


The + direction of DoTSTAR’ also forces that @g satisfies disjointness; this is the 
key reason that we cannot use rationals ((0, 1], +, x). Clearly the side-condition- 
free DoTSTAR’ rule is preferable to the DoTSTAR in Fig. 3, and it would also be 
preferable to have bidirectionality for predicate multiplication over implication 
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and negation. Unfortunately, as we will see shortly, the disjointness of Gg places 
strong multiplicative algebraic constraints on the share model. These constraints 
are the reason we cannot support the DotImpt’ rule and why we require the 
m’-uniformity side condition in our DOTSTAR rule. 


8.2 Disjointness in a Multiplicative Setting 


Our goal now is to explore the algebraic consequences of the disjointness prop- 
erty in a multiplicative setting. Suppose (S,@) is a CSA with a single unit €, 
top element F, and 6 complements 5. Suppose further that shares satisfy the 
disjointness property aa = b > a= E. For the multiplicative structure, assume 
(S, @, F) is a monoid (i.e. the axioms forced by the DoTDoOT, DoTMaApPsTOo, and 
DOTFULL rules). It is undesirable for a share model if multiplying two positive 
shares (e.g. the ability to read a memory cell) results in the empty permission, 
so we assume that when 7, and 7 are non-€ then their product 71 Q m2 Æ E. 

Now add left or right distributivity. We choose right distributivity (s1 ®s2)® 
s3 = (81 @83) (s28 s3); the situation is mirrored with left. Let us show that we 
cannot have left inverses for 7 # F. We prove by contradiction: suppose 7 # F 
and there exists m~! such that 7~! @ 7 = F. Then 


n=Fen=(" On) 8r = (17! On O(r Or) =F O(n! Oz) 


Let e = 7-1 Q r. Now t = F Ge = (€ e) Ge, which by associativity and 
disjointness forces e = E, which in turn forces 7 = F, a contradiction. 

Now suppose that instead of adding multiplicative inverses we have both 
left and right distributivity. First we prove (Lemma 1) that for arbitrary s € S, 
s&53=58 s. We calculate: 


(s®@s)@(s@38) = s8(s85) =s@F = s = Fs = (803) 85 = (8858) G(5@s8) 
Lemma 1 follows by the cancellativity of @ between the far left and the far right. 
Now we show (Lemma 2) that s & 5 = E. We calculate: 
FH=FQF=(8 03) 8(805) =(888) (583) OG (F858) G (3873) 
= (s8 s) (s85) (s83) (58753) 


The final equality is by Lemma 1. The underlined portion implies s ® 5 = E 
by disjointness. The upshot of Lemma 2, together with our requirement that the 
product of two positive shares be positive, is that we can have no more than 
the two elements € and F in our share model. Since the entire motivation for 
fractional share models is to allow ownership between € and F, we must choose 
either left or right distributivity; we choose right since we are able to prove that 
the x’-uniformity side condition enables the bidirectional DOTSTAR. 


9 Related Work 


Fractional permissions are essentially used to reason about resource ownership in 
concurrent programming. The well-known rational model ((0,1],+) by Boyland 
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et al. [5] is used to reason about join-fork programs. This structure has the 
disjointness problem mentioned in Sect. 2, first noticed by Bornat et al. [4], as 
well as other problems discussed in Sects.3, 4, and [2]. Boyland [6] extended 
the framework to scale permissions uniformly over arbitrary predicates with 
multiplication, e.g., he defined 7- P as “multiply each permission 7’ in P with 
m”. However, his framework cannot fit into SL and his scaling rules are not bi- 
directional. Jacobs and Piessens [28] also used rationals for scaling permissions 
r- P in SL but only obtained one direction for DOTSTAR and DoTPuus. A 
different kind of scaling permission was used by Dinsdale-Young et al. [20] in 
which they used rationals to define permission assertions [A]” to indicate a thread 
with permission 7 can execute the action A over the shared region r. 

There are other flavors of permission besides rationals. Bornat et al. [4] intro- 
duced integer counting permissions (Z,+,0) to reason about semaphores and 
combined rationals and integers into a hybrid permission model. Heule et al. [23] 
flexibly allowed permissions to be either concretely rational or abstractly read- 
only to lower the nuisance of detailed accounting. A more general read-only 
permissions was proposed by Charguéraud and Pottier [13] that transforms a 
predicate P into read-only mode RO(P) which can duplicated/merged with the 
bi-entailment RO(P) J- RO(P) x RO(P). Their permissions distribute pleas- 
antly over disjunction and existential quantifier but only work one way for x, 
i.e., RO(Hı x H2) F RO(H1) x RO(H2). Parkinson [41] proposed subsets of the 
natural numbers for shares (P(N), w) to fix the disjointness problem. Compared 
to tree shares, Parkinson’s model is less practical computationally and does not 
have an obvious multiplicative structure. 

Protocol-based logics like FCSL [38] and Iris [30] have been very successful 
in reasoning about fine-grained concurrent programs, but their high expressiv- 
ity results in a heavyweight logic. Automation (e.g. inference such as we do 
in Sect.4) has been hard to come by. We believe that fractional permissions 
and protocol-based logics are in a meaningful sense complementary rather than 
competitors. 

Verification tools often implement rational permissions because of its sim- 
plicity. For example, VeriFast [29] uses rationals to verify programs with locks 
and semaphores. It also allows simple and restrictive forms of scaling permis- 
sions which can be applied uniformly over standard predicates. On the other 
hand, HIP/SLEEK [31] uses rationals to model “thread as resource” so that the 
ownership of a thread and its resources can be transferred. Chalice [36] has ratio- 
nal permissions to verify properties of multi-threaded, objected-based programs 
such as data races and dead-locks. Viper [37] has an expressive intermediate lan- 
guage that supports both rational and abstract permissions. However, a number 
of verification tools have chosen tree shares due to their better metatheoretical 
properties. VST [3] is equipped with tree share permissions and an extensive tree 
share library. HIP/SLEEK uses tree shares to verify the barrier structure [26] 
and has its own complete share solver [33,35] that reduces tree formulae to 
Boolean formulae handled by Z3 [17]. Lastly, tree share permissions are featured 
in Heap-Hop [47] to reason over asynchronous communications. 
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10 Conclusion 


We presented a separation logic proof framework to reason about resource shar- 
ing using fractional permissions in concurrent verification. We support sophisti- 
cated verification tasks such as inductive predicates, proving predicates precise, 
and biabduction. We wrote Sharelnfer to gauge how our theories could be auto- 
mated. We developed scaling separation algebras as compositional models for our 
logic. We investigated why our logic cannot support certain desirable properties. 
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Abstract. Monitors constitute one of the common techniques to syn- 
chronize threads in multithreaded programs, where calling a wait com- 
mand on a condition variable suspends the caller thread and notifying a 
condition variable causes the threads waiting for that condition variable 
to resume their execution. One potential problem with these programs is 
that a waiting thread might be suspended forever leading to deadlock, a 
state where each thread of the program is waiting for a condition variable 
or a lock. In this paper, a modular verification approach for deadlock- 
freedom of such programs is presented, ensuring that in any state of the 
execution of the program if there are some threads suspended then there 
exists at least one thread running. The main idea behind this approach 
is to make sure that for any condition variable v for which a thread is 
waiting there exists a thread obliged to fulfil an obligation for v that 
only waits for a waitable object whose wait level, an arbitrary number 
associated with each waitable object, is less than the wait level of v. The 
relaxed precedence relation introduced in this paper, aiming to avoid 
cycles, can also benefit some other verification approaches, verifying 
deadlock-freedom of other synchronization constructs such as channels 
and semaphores, enabling them to accept a wider range of deadlock-free 
programs. We encoded the proposed proof rules in the VeriFast program 
verifier and by defining some appropriate invariants for the locks asso- 
ciated with some condition variables succeeded in verifying some popu- 
lar use cases of monitors including unbounded/bounded buffer, sleeping 
barber, barrier, and readers-writers locks. A soundness proof for the pre- 
sented approach is provided; some of the trickiest lemmas in this proof 
have been machine-checked with Coq. 


1 Introduction 


One of the popular mechanisms for synchronizing threads in multithreaded pro- 
grams is using monitors, a synchronization construct allowing threads to have 
mutual exclusion and also the ability to wait for a certain condition to become 
true. These constructs, consisting of a mutex/lock and some condition variables, 
provide some basic functions for their clients, namely wait(v, l), causing the call- 
ing thread to wait for the condition variable v and release lock | while doing 
so, and notify(v)/notifyAll(v), causing one/all thread(s) waiting for v to resume 
their execution. Each condition variable is associated with a lock; a thread must 
© The Author(s) 2018 
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acquire the associated lock for waiting or notifying on a condition variable, and 
when a thread is notified it must reacquire the associated lock. 

However, one potential problem with these synchronizers is deadlock, where 
all threads of the program are waiting for a condition variable or a lock. To clarify 
the problem consider the program in Fig. 1, where a channel consists of a queue 
q, a lock l and a condition variable v, protecting a thread from dequeuing q when 
it is empty. In this program the receiver thread first acquires lock | and while 
there is no item in q it releases l, suspends itself and waits for a notification 
on v. If this thread is notified while q is not empty it dequeues an item and 
finally releases l. The sender thread also acquires the same lock, enqueues an 
item into q, notifies one of the threads waiting for v, if any, and lastly releases 
l. After creating a channel ch, the main thread of the program first forks a 
thread to receive a message from ch and then sends a message on ch. Although 
this program is deadlock-free, it is easy to construct some variations of it that 
lead to deadlock: if the main thread itself, before sending any messages, tries 
to receive a message from ch, or if the number of receives is greater than the 
number of sends, or if the receiver thread waits for v even if q is not empty. 


routine main() routine send(channel ch, int d) routine receive(channel ch) 
{q := newqueue; {acquire(ch.l); {acquire(ch.1); 

l := newlock; enqueue(ch.q, d); while(sizeof(ch.q) = 0) 

v := newcond; notify(ch.v); wait(ch.v, ch.l); 

ch := channel(q,l,v); release(ch.l)} d := dequeue(ch.q); 

fork (receive(ch)); release(ch.l); 

send(ch, 12)} d} 


Fig. 1. A message passing program synchronized using a monitor 


Several approaches to verify termination, deadlock-freedom, liveness, and 
finite blocking of threads of programs have been presented. Some of these 
approaches only work with non-blocking algorithms [1-3], where the suspension 
of one thread cannot lead to the suspension of other threads. These approaches 
are not applicable for condition variables because suspension of a sender thread 
in Fig. 1, for example, might cause a receiver thread to be blocked forever. Some 
other approaches are also presented to verify termination of programs using some 
blocking constructs such as channels [4—6] and semaphores [7]. These approaches 
are not general enough to cover condition variables because unlike the channels 
and semaphores a notification of a condition variable is lost when there is no 
thread waiting for that condition variable. There are also some studies [8-10] to 
verify correctness of programs that support condition variables. However, these 
approaches either only cover a very specific application of condition variables, 
such as a buffer program with only one producer and one consumer, or are not 
modular and suffer from a long verification time when the size of the state space, 
such as the number of threads, is increased. 
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In this paper we present a modular approach to verify deadlock-freedom 
of programs in the presence of condition variables. More specifically, this app- 
roach makes sure that for any condition variable v for which a thread is wait- 
ing there exists a thread obliged to fulfil an obligation for v that only waits 
for a waitable object whose wait level, an arbitrary number associated with 
each waitable object, is less than the wait level of v. The presented approach 
is modular, meaning that different modules (functions) of a program can be 
verified individually. This approach is based on Leino et al. [4] approach for 
verification of deadlock-freedom in the presence of channels and locks, which in 
turn was based on Kobayashi’s [6] type system for verifying deadlock-freedom 
of z-calculus processes, and extends the separation logic-based encoding [11] by 
covering condition variables. We implemented the proposed proof rules in the 
VeriFast verifier [12-14] and succeeded in verifying some common applications 
of condition variables such as bounded/unbounded buffer, sleeping barber [15], 
barrier, and readers-writers locks (see the full version of this paper [16] reporting 
the verification time of these programs). 

This paper is structured as follows. Section2 provides some background 
information on the existing approaches upon which we build our verification 
algorithm. Section 3 introduces a preliminary approach for verifying deadlock- 
freedom of some common applications of condition variables. In Sect. 4 the prece- 
dence relation, aiming to avoid cycles, is relaxed, making it possible to verify 
some trickier applications of condition variables. A soundness proof of the pre- 
sented approach is lastly given in Sect. 5. 


2 Background Information on the Underlying Approaches 


In this section we provide some background information on the existing 
approaches that verify absence of data races and deadlock in the presence of 
locks and channels that we build on. 


2.1 Verifying Absence of Data Races 


Locks/mutexes are mostly used to avoid data races, an undesired situation where 
a heap location is being written and accessed concurrently by two different 
threads. One common approach to verify absence of these undesired conditions 
is ownership: ownership of heap locations is assigned to threads and it is verified 
that a thread accesses only the heap locations that it owns. Transferring owner- 
ship of heap locations between threads is supported through locks by allowing 
locks, too, to own heap locations. While a lock is not held by a thread, it owns 
the heap locations described by its invariant. More specifically, when a lock is 
created the resources specified by its invariant are transferred from the creating 
thread to the lock, when that lock is acquired these resources are transferred 
from the lock to the acquiring thread, and when that lock is released these 
resources, that must be again in possession of the thread, are again transferred 
from the thread to the lock [17]. Figure 2 illustrates how a program increasing a 
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x:=newint(0); routine inc(counter ct) { 
{x0} {lock(ct.l) A I(1)=inv(ct) } 
l := newlock; acquire(ct.1); 

{ulock(l) x a-+0} {locked (ct.l) x dz. ct.x—z} 
ct := counter(x:=z, l:=l); ct.x:=ct.x+ 1; 

{ulock(ct.l) x ct.x—0} {locked(ct.l) x dz. ct.x—z} 
{ulock(ct.l) x inv(ct)} release(ct.l) 

{lock(ct.l) A I(l)=inv(ct)} {lock(ct.l)}} 


{lock(ct.l) x lock(ct.l)} 
fork (inc(ct)); 
{lock(ct.l)} 

inc(ct) 


Fig. 2. Verification of data-race-freedom of a program, where inv = Act. dz. ct.x—>z 


counter, which consists of an integer variable x and a lock l protecting this vari- 
able, can be verified, where two threads try to write on the variable x. We use 
separation logic [18] to reason about the ownership of permissions. As indicated 
below each command, creating the integer variable x initialized by zero provides 
a read/write access permission to x, denoted by 2:0. This ownership, that is 
going to be protected by lock /, is transferred to the lock because it is asserted by 
the lock invariant inv, which is associated with the lock, as denoted by function |, 
at the point where the lock is initialized. The resulting lock permission, that can 
be duplicated, is used in the routine inc, where x is increased under protection 
of lock I. Acquiring this lock in this routine provides a full access permission to 
x and transforms the lock permission to a locked permission, implying that the 
related lock has been acquired. Releasing that lock again consumes this access 
permission and transforms the locked permission to a lock one. 


2.2 Verifying Absence of Deadlock 


One potential problem with programs using locks and other synchronization 
mechanisms is deadlock, an undesired situation where all threads of the program 
are waiting for some waitable objects. For example, a program can deadlock if a 
thread acquires a lock and forgets to release it, because any other thread waiting 
for that lock never succeeds in acquiring that lock. As another example, if in a 
message passing program the number of threads trying to receive a message 
from a channel is greater than the number of messages sent on that channel 
there will be some threads waiting for that channel forever. One approach to 
verify deadlock-freedom of channels and locks is presented by Leino et al. [4] that 
guarantees deadlock-freedom of programs by ensuring that (1) for any obligee 
thread waiting for a waitable object, such as a channel or lock, there is an 
obligation for that object that must be fulfilled by an obligor thread, where a 
thread can fulfil an obligation for a channel/lock if it sends a message on that 
channel/releases that lock, and (2) each thread waits for an object only if the 
wait level of that object, an arbitrary number assigned to each waitable object, 
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is lower than the wait levels of all obligations of that thread. The second rule is 
established by making sure that when a thread with some obligations O executes 
a command acquire(o)/receive(o) the precondition o<O holds, i.e. the wait level 
of o is lower than the wait levels of obligations in O. To meet the first rule where 
the waitable object is a lock, as the example in the left side of Fig. 3 illustrates, 
after acquiring a lock, that lock is loaded onto the bag! (multiset) of obligations 
of the thread, denoted by obs(O). This ensures that if a thread tries to acquire 
a lock that has already been acquired then there is one thread obliged to fulfil 
an obligation for that lock. 


{obs(O) * lock(l) A 1<O} {obs(O)} 

acquire(l); {obs(OW{ch}) * credit(ch) } 
{obs(OW{I}) * locked (1) » I(1)} fork ( 

a {obs({}) * credit(ch) A ch~{}} 
{obs(OW{I}) * locked (1) * (1) } receive(ch) 

release(1) {obs({}) } 

{obs(O) x lock(1)} J: 


{obs(Ow{ch})} 
send(ch, 12) {obs(O)} 


Fig. 3. Verification of deadlock-freedom of locks (left side) and channels (right side) 


To establish the first rule where the waitable object is a channel any thread 
trying to receive a message from a channel ch must spend one credit for ch. This 
credit is normally obtained from the thread that has forked the receiver thread, 
where this credit is originally created by loading ch onto the bag of obligations 
of the forking thread. The forking thread can discharge the loaded obligation 
by either sending a message on the corresponding channel or delegating it to a 
child thread that can discharge it. The example on the right side of Fig. 3 shows 
the verification of deadlock-freedom a program in which the main routine, after 
forking a obligee thread trying to receive a message from channel ch, sends a 
message on this channel. Before forking the receiver thread, a credit and an 
obligation for the channel ch are created in the main thread. The former is given 
to the forked thread, where this credit is spent by the receive(ch) command, 
and the latter is fulfilled by the main thread when it executes the command 
send(ch, 12). 

More formally, the mentioned verification approach satisfies the first rule by 
ensuring that for each channel ch in the program the number of obligations for ch 
is equal to/greater than the number of threads waiting for ch. This assurance is 
obtained by preserving the invariant Wt(ch)+Ct(ch) < Ot(ch)+sizeof(ch), while 
the programming language itself ensures that sizeof(ch) > 0 > Wt(ch) = 0, 
where sizeof is a function mapping each channel to the size of its queue, Wt(ch) 


1 We treat bags of waitable objects as functions from waitable objects to natural 
numbers. 
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is the total number of threads currently waiting for channel ch, Ot(ch) is the 
total number of obligations for channel ch held by all threads, and Ct(ch) is the 
total number of credits for channel ch currently in the system. 


2.3 Proof Rules 


The separation logic-based proof rules, introduced by Jacobs et al. [11], avoid- 
ing data races and deadlock in the presence of locks and channels are shown in 
Fig. 4, where R and | are functions mapping a waitable object/lock to its wait 
level/invariant, respectively, and g_initl, and g_load are some ghost commands used 
to initialize an uninitialized lock permission and load a channel onto the bag of 
obligations and credits of a thread, respectively. When a lock is created, as shown 
in NEWLOCK, an uninitialized lock permission ulock(/) is provided for that thread. 
Additionally, an arbitrary integer number z can be decided as the wait level of that 
lock that is stored in R. Note that variable z in this rule is universally quantified 
over the rule, and different applications of the NEWLOCK rule can use different 
values for this variable. The uninitialized lock permission, as shown in INITLOCK, 
can be converted to a normal lock permission lock(!) provided that the resources 
described by the invariant of that lock, stored in |, that must be in possession of the 
thread, are transferred from the thread to the lock. By the rule ACQUIRE, having 
a lock permission, a thread can acquire that lock if the wait levels of obligations of 
that thread are all greater than the wait level of that lock. After acquiring the lock, 
the resources represented by the invariant of that lock are provided for the acquir- 
ing thread and the permission lock is converted to a locked permission. When a 


NEWLOCK INITLOCK 
{true} newlock {Al. ulock(l) A R(J)=z} {ulock(}) x i} g_initl(7) {A_. lock(l) A I(1)=i} 
ACQUIRE {lock(l) * obs(O) A 1<O} acquire(l) {A_. obs(Ow{l}) * locked (1) * I(1)} 


RELEASE {obs(Q) x locked(/) x I(/)} release(1) {A_. obs(O—{I}) * lock(/) } 


NEWCHANNEL SEND 
{true} newchannel {Ach. R(ch)=z} {obs(O)} send(ch, v) {A_. obs(O—{ch})} 
RECEIVE 


{obs(O) * credit(ch) A chXO} receive(ch) {A_. obs(O)} 


FORK 
{a x obs(O)} c {A_. obs({})} 


{a * obs(OUO’)} Tork(c) TA. obs(O)} DupLock lock(l) <= lock(l) * lock(l) 


LoaDOB {obs(O)} g_load(ch) {A_. obs(OW{ch}) * credit(ch) } 


Fig. 4. Proof rules ensuring deadlock-freedom of channels and locks, where o<O = 
Yo’ € O. R(o) < R(o’) 


Deadlock-Free Monitors 421 


thread releases a lock, as shown in the rule RELEASE, the resources indicated by 
the invariant of that lock, that must be in possession of the releasing thread, are 
transferred from the thread to the lock and the permission locked is again con- 
verted to a lock permission. By the rule RECEIVE a thread with obligations O can 
try to receive a message from a channel ch only if the wait level of ch is lower than 
the wait levels of all obligations in O. This thread must also spend one credit for 
ch, ensuring that there is another thread obliged to fulfil an obligation for ch. As 
shown in the rule SEND, an obligation for this channel can be discharged by send- 
ing a message on that channel. Alternatively, by the rule FORK, a thread can dis- 
charge an obligation for a channel if it delegates that obligation to a child thread, 
provided that the child thread discharges the delegated obligation. In this setting 
the verification of a program starts with an empty bag of obligations and must 
also end with such bag implying that there is no remaining obligation to fulfil. 
However, this verification approach is not straightforwardly applicable to 
condition variables. A command notify cannot be treated like a command send 
because a notification on a condition variable is lost when there is no thread 
waiting for that variable. Accordingly, it does not make sense to discharge an 
obligation for a condition variable whenever it is notified. Similarly, a command 
wait cannot be treated like a command receive. A command wait is normally 
executed in a while loop, checking the waiting condition of the related condition 
variable. Accordingly, it is impossible to build a loop invariant for such a loop if 
we force the wait command to spend a credit for the related condition variable. 


3 Deadlock-Free Monitors 


3.1 High-Level Idea 


In this section we introduce an approach to verify deadlock-freedom of pro- 
grams in the presence of condition variables. This approach ensures that the 
verified program never deadlocks, i.e. there is always a running thread, that is 
not blocked, until the program terminates. The main idea behind this approach 
is to make sure that for any condition variable v for which a thread is waiting 
there exists a thread obliged to fulfil an obligation for v that only waits for a 
waitable object whose wait level is less than the wait level of v. As a consequence, 
if the program has some threads suspended, waiting for some obligations, there is 
always a thread obliged to fulfil the obligation Omin that is not suspended, where 
Omin has a minimal wait level among all waitable objects for which a thread is 
waiting. Accordingly, the proposed proof rules make sure that (1) when a com- 
mand wait(v, l) is executed Ot(v) > 0, where Ot maps each condition variable v 
to the total number of obligations for v held by all threads (note that having a 
thread with permission obs(O) implies O(v) < Ot(v)), (2) a thread discharges 
an obligation for a condition variable only if after this discharge the invariant 
one_ob(v, Wt, Ot) defined as Wt(v) > 0 => Ot(v) > 0 still holds, where Wt(v) 
denotes the number of threads waiting for condition variable v, and (3) a thread 
with obligations O executes a command wait(v, l) only if v<O. 
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3.2 Tracking Numbers of Waiting Threads and Obligations 


For all condition variables associated with a lock I the value of functions Wt and 
Ot can only be changed by a thread that has locked l; Wt(v) is changed only 
when one of the commands wait(v, !)/notify(v) /notifyAll(v) is executed, requiring 
holding lock l, and we allow Ot(v) to be changed only when a permission locked 
for l is available. Accordingly, when a thread acquires a lock these two bags 
are stored in the related locked permission and are used to establish the rules 
number 1 and 2, when a thread executes a wait command or discharges one 
of its obligations. Note that the domain of these functions is the set of the 
condition variables associated with the related lock. The thread executing the 
critical section can change these two bags under some circumstances. If that 
thread loads/discharges a condition variable onto/from the list of its obligations 
this condition variable must also be loaded/discharged onto/from the bag Ot 
stored in the related locked permission. Note that unlike the approach presented 
by Leino et al. [4], an obligation for a condition variable can arbitrarily be 
loaded or discharged by a thread, provided that the rule number 2 is respected. 
At the start of the execution of a wait(v, l) command, Wt(v) is incremented and 
after execution of commands notify(v) /notifyAll(v) one/all instance(s) of v is/are 
removed from the bag Wt stored in the related locked permission, since these 
commands change the number of threads waiting for v. 

A program can be successfully verified according to the mentioned rules, 
formally indicated in Fig.5, if each lock associated with any condition vari- 
able v has an appropriate invariant such that it implies the desired invariant 
one_ob(v, Wt, Ot). Accordingly, the proof rules allow locks to have invariants 
parametrized over the bags Wt and Ot. When a thread acquires a lock the result 
of applying the invariant of that lock to these two bags, stored in the related 
locked permission, is provided for the thread and when that lock is released it is 
expected that the result of applying the lock invariant to those bags, stored in 
the related locked permission, again holds. However, before execution of a com- 
mand wait(v, l), when lock l with bags Wt and Ot stored in its locked permission 
is going to be released, it is expected that the invariant of l holds with bags 
Wiw{v} and Ot because the running thread is going to wait for v and this con- 
dition variable is going to be added to Wt. As this thread resumes its execution, 
when it has some bags Wt’ and Ot’ stored in the related locked permission, the 
result of applying the invariant of l to these bags is provided for that thread. Note 
that the total number of threads waiting for v, Wt(v), is already decreased when 
a command notify(v) or notifyAll(v) is executed, causing the waiting thread(s) 
to wake up and try to acquire the lock associated with v. 


3.3 Resource Transfer on Notification 


In general, as we will see when looking at examples, it is sometimes necessary 
to transfer resources from a notifying thread to the threads being notified?. 


? This transfer is only sound in the absence of spurious wake-ups, where a thread 
is awoken from its waiting state even though no thread has signaled the related 
condition variable. 
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To this end, these resources, specified by a function M, are associated 
with each condition variable v when v is created, such that the commands 
notify(v) /notifyAll(v) consume one/ Wt(v) instance(s) of these resources, respec- 
tively, and the command wait(v,/) produces one instance of such resources (see 
the rules WaIT,NoTIFY, and NOTIFYALL in Fig. 5). 


NEwLock {true} newlock {Al. ulock(, {}, {}) A R(1)=2} 


NEWCv {true} newcond {Av. R(v)=z A L(v)=l A M(v)=m} 


{lock(/) x obs(O) A XO} acquire(Z) 


ACQUIRE 1) 3 Wt, Ot. locked(!, Wt, Ot) « I(J)( Wt, Ot) + obs(Ow{1})} 


RELEASE 
{locked(1, Wt, Ot) * I(L)( Wt, Ot) * obs(OW{I})} release(J) {A_. lock() * obs(O) } 


{locked(I, Wt, Ot) x I(1)(Wtw{u}, Ot) x obs(OW{/}) 
WAIT A l=L(v) A vXO A IXO ^ safe_obs(v, Wtw{u}, Ot)} wait(v, 1) 
{\_. obs(OW{I}) x IWt', Ot’. locked(1, We", Ot’) x I(1)( We’, Ot’) x M(v)} 


{locked(L(v), Wt, Ot) * (Wt(v) = 0 V M(v))} notify(v) 


NotIFy {A locked(L(v), Wt—{u}, Ot)} 


NotiFy ALL 
{locked(L(v), Wt, Ot) x CH M())} notifyAll(v) {A.. locked(L(v), Wt[v:=0], Ot)} 


INITLOCK 
{ulock(l, Wt, Ot) x inv( Wt, Ot) * obs(O)} g-initI(1) {A_. lock(l) * obs(O) A I(l)=inv} 


{obs(O) * ulock/locked(L(v), Wt, Ot)} g-chrg(v) 


CHĦARGEOB vX obs(Ow{u})+ulock/locked(Llo); Wt, Otuto} 


{obs(O) x ulock/locked(L(v), Wt, Ot) A safe-obs(v, Wt (v), Ot—{v})} 


DisOB g_disch(v) {A_. obs(O—{v}) * ulock/locked(L(v), Wt, Ot—{v})} 


Fig. 5. Proof rules to verify deadlock-freedom of condition variables, where Wt(v) 
and Ot(v) denote the total number of threads waiting for v and the total number 
of obligations for v, respectively, and safe_obs(v, Wt, Ot) < one_ob(v, Wt, Ot) and 
one_ob(v, Wt, Ot) = (Wt(v) > 0 = Ot(v) > 0) 


3.4 Proof Rules 


Figure 5 shows the proposed proof rules used to verify deadlock-freedom of 
condition variables, where L and M are functions mapping each condition variable 
to its associated lock and to the resources that are moved from the notifying 
thread to the notified one when that condition variable is notified, respectively. 
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Creating a lock, as shown in the rule NEWLOCK, produces a permission ulock 
storing the bags Wt and Ot, where these bags are initially empty. The bag 
Ot in this permission, similar to a locked one, can be changed provided that the 
obligations of the running thread are also updated by one of the ghost commands 
g_chrg(v) or g-disch(v) (see rules CHARGEOB and DisOs). The lock related to 
this permission can be initialized by transferring the resources described by 
the invariant of this lock, that is now parametrized over the bags Wt and Ot, 
applied to the bags stored in this permission from the thread to the lock (see 
rule INITLOCK). When this lock is acquired, as shown in the rule ACQUIRE, the 
resources indicated by its invariant are provided for the thread, and when it is 
released, as shown in the rule RELEASE, the resources described by its invariant 
that must hold with appropriate bags, are again transferred from the thread 
to the lock. The rules WAIT and DISOB ensure that for any condition variable 
v when the number of waiting threads is increased, by executing a command 
wait(v,/), or the number of the obligations is decreased, by (logically) executing 
a command g_disch(v), the desired invariant one_ob still holds. Additionally, the 
rules ACQUIRE and WAIT make sure that a thread only waits for a waitable 
object whose wait level is lower that the wait levels of obligations of that thread. 
Note that in the rule WAIT in the precondition of the command wait(v,/) it is 
not necessary that the wait level of v is lower that the wait level of l, since lock 
l is going to be released by this command. However, in this precondition the 
wait level of | must be lower that the wait levels of the obligations of the thread 
because when this thread is notified it tries to reacquire l, at which point (xO 
must hold. The commands notify(v) /notifyAll(v), as shown in the rules NOTIFY 
and NOTIFYALL, remove one/all instance(s) of v, if any, from the bag Wt stored 
in the related locked permission. Additionally, notify(v) consumes the moving 
resources, indicated by M(v), that appear in the postcondition of the notified 
thread. Note that notifyAll(v) consumes Wt(v) instances of these resources, since 
they are transferred to Wt(v) threads waiting for v. 


3.5 Verifying Channels 


Ghost Counters. We will now use our proof system to prove deadlock-freedom 
of the program in Fig. 1. To do so, however, we will introduce a ghost resource 
that plays the role of credits, in such a way that we can prove the invariant 
Wt(ch) + Ct(ch) < Ot(ch) + sizeof(ch). In particular, we want this property 
to follow from the lock invariant. This means we need to be able to talk, in 
the lock invariant, about the total number of credits in the system. To achieve 
this, we introduce a notion of ghost counters and corresponding ghost counter 
tickets, both of which are a particular kind of ghost resources. Specifically, we 
introduce three ghost commands: g_newctr, g_inc, and g_dec. g_newctr allocates 
a new ghost counter whose value is zero and returns a ghost counter identifier 
c for it. g_inc(c) increments the value of the ghost counter with identifier c and 
produces a ticket for the counter. g_dec(c), finally, consumes a ticket for ghost 
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NEWCOUNTER {true} g_newctr {Ac. ctr(c, 0)} 
INCCOUNTER {ctr(c,n)} g_inc(c) {A_. ctr(c, n+1) * tic(c)} 


DECCOUNTER {ctr(c, n) * tic(c)} g-dec(c) {A_. ctr(c, n—1) A 0<n} 


Fig. 6. Ghost counters 


counter c and decrements the ghost counter’s value. Since these are the only 
operations that manipulate ghost counters or ghost counter tickets, it follows 
that the value of a ghost counter c is always equal to the number of tickets for 
c in the system. Proof rules for these ghost commands are shown in Fig. 6°. 


The Channels Proof. Figure 7 illustrates how the program in Fig. 1 can be ver- 
ified using our proof system. The invariant of lock ch.l in this program, denoted 
by inv(ch), is parametrized over bags Wt, Ot and implies the desired invariant 
one_ob(ch.v, Wt, Ot). The permission ctr(ch.c,Ctv) in this invariant indicates 
that the total number of credits (tickets) for ch.v is Ctv, where ch.cis a ghost field 
added to the channel data structure, aiming to store a ghost counter identifier 
for the ghost counter of ch.v. Generally, a lock invariant can imply the invariant 
one_ob(v, Wt, Ot) if it asserts Wt(v) + Ct(v) < Ot(v) + S(v) and Wt(v) < Ot(v), 
where Ct(v) is the total number of credits for v and S(v) is an integer value such 
that the command wait(v,/) is executed only if S(v) < 0. After initializing l in 
the main routine, there exists a credit for ch.v (denoted by tic(ch.c)) that is 
consumed by the thread executing the receive routine, and also an obligation for 
ch.v that is fulfilled by this thread after executing the send routine. The credit 
tic(ch.c) in the precondition of the routine receive ensures that before execution 
of the command wait(ch.v, ch.l), Ot(ch.v) > 0. This inequality follows from the 
invariant of lock J, which holds for Wtw{ch.v} and Ot when Ctv is decreased 
by g-dec(ch.c). This credit (or the one specified by M(ch.v) that is moved from 
a notifier thread when the receiver thread wakes up) must be consumed after 
execution of the command dequeue(ch.q) and before releasing ch.l to make sure 
that the invariant still holds after decreasing the number of items in ch.q. The 
obligation for ch.v in the precondition of the routine send is discharged by this 
routine, which is safe, since after the execution of the commands enqueue and 
notify the invariant one_ob(ch.v, Wt, Ot — {ch.v}), which follows from the lock 
invariant, holds. 


3 Some logics for program verification, such as Iris [19], include general support for 
defining ghost resources such as our ghost counters. In particular, our ghost counters 
can be obtained in Iris as an instance of the authoritative monoid [19, p. 5]. 
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inv(channel ch) ::= AWt. AOt. ACtv. ctr(ch.c, Ctv) * ds. queue(ch.q, s) A 
L(ch.v)=ch.l A M(ch.v)=tic(ch.c) A 

Wt(ch.v) + Ctv < Ot(ch.v) + 8 A 

Wt(ch.v) < Ot(ch.v) 


routine main(){{obs({}) } 

q:=newqueue; l:=newlock; v:=newcond; c:=g_newctr; g_inc(c); 
{obs({}) x ulock(d, {}, {}) * queue(q, 0) * ctr(c, 1) * tic(c) 

A L(v)=l A M(v)=tic(c) A R(J)=0 A R(v)=1} 
ch:=channel(q,l,v); ch.c:=c; 

{obs({}) * ulock(I, {}, {}) * inv(ch)({}, {o}) * tie(c)} g-chra(v): 
{obs({v}) * ulock(J, {}, {u}) * inv(ch)({}, {v}) * tic(c)} g_initl(/); 
{obs({v}) * lock(Z) * tic(e) A I(2)=inv(ch)} 

fork (receive(ch)); 

{obs({u}) » lock(1)} 

send(ch, 12) {obs({})}} 


routine receive(channel ch) { 
{obs(O) * tic(ch.c) x lock(ch.l) A ch.l xO A ch.vXO A I(ch.1)=inv(ch)} 
acquire(ch.l); 
{obs(OW{ch.l}) * tic(ch.c) x 3 Wt, Ot. locked(ch.l, Wt, Ot) x inv(ch)( Wt, Ot)} 
while(sizeof(ch.q) = 0){ g_dec(ch.c); 
{obs(OW{ch.l}) x 3 Wt, Ot. locked(ch.l, Wt, Ot) x inv(ch)(Wtw{ch.v}, Ot) }} 
wait(ch.v, ch.l) 
{obs(OW{ch.1}) x M(ch.v) x 3 Wt, Ot. locked(ch.l, Wt, Ot) x inv(ch)( Wt, Ot)}}; 
dequeue(ch.q); g_dec(ch.c); 
{obs(OW{ch.l}) x 3 Wt, Ot. locked(ch.l, Wt, Ot) * inv(ch)( Wt, Ot)} 
release(ch.l) {obs(O) * lock(ch.l)}} 


routine send(channel ch, int d){ 

{obs(OW{ch.v}) * lock(ch.l) A ch.l<OW{ch.v} A I(ch.1)=inv(ch) } 
acquire(ch.l); 

{obs(OW{ch.v, ch.l}) x IWt, Ot. locked(ch.l, Wt, Ot) x inv(ch)( Wt, Ot) } 
enqueue(ch.q, d); 

if (Wt(ch.v)>0) g_inc(ch.c); 

notify(ch.v); 

{obs(OW{ch.v, ch.l}) x IWt, Ot. locked(ch.l, Wt, Ot) x inv(ch)(Wt, Ot—{ch.v})} 
g_disch(ch.v); 

{obs(OW{ch.l}) x 3 Wt, Ot. locked(ch.l, Wt, Ot) x inv(ch)( Wt, Ot)} 
release(ch.l) {obs(O) * lock(ch.l)}} 


Fig. 7. Verification of the program in Fig. 1 


3.6 Other Examples 


Using the proof system of this section we prove two other deadlock-free programs, 
namely sleeping barber [16], and barrier. In the barrier program shown in Fig. 8, a 
barrier b consists of an integer variable r indicating the number of the remaining 
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routine main(){ routine wait_for_rest(barrier b){ 
r:=newint(3); acquire(b.l); 
l:=newlock; b.r:=b.r—1; 
v:=newcond; if(b.r=0) 
b:=barrier(r, l, v); notifyAll(); 
fork (taski(); wait_for_rest(b); task2()); else 
fork (taski(); wait_for_rest(b); tasko()); while(b.r>0) 
taski(); wait_for_rest(b); tasko()} wait(b.v, b.l); 
release(b.l) } 


inv(barrier b) ::= A Wt. AOt. 3r20. b.r—=r A L(b.v)=b.1 A M(b.v)=true A 
(Wt(b.v) =OVO<r) A (r < Ot(b.v)) 


routine main(){{obs({})} 

r:=newint(3); 1:=newlock; v:=newcond; 

{obs({}) x r+-3 * ulock(I, {}, {}) A L(v)=l A M(v)=true A R(/)=0 A R(v)=1} 
b:=barrier(r, l, v); 

{obs({}) * inv(b)({}, {3-u}) * ulock(/, {}, {})} 

g-chrg(v); g-chrg(v); g-chrg(v); g-initl(1); 

{obs({3-v}) * lock(1) A I(1)=inv(b) } 

fork (wait_for_rest(b)); 

{obs({2-u}) » lock(Z)} 

fork (wait_for_rest(b)); 


{obs({u}) * lock(7)} 
wait_for_rest(b) {obs({})}} 


routine wait_for_rest(barrier b){ 
{obs(OW{b.v}) * lock(b.1) A b.l<OW{b.v} A b.v<O A I(b.1)=inv(b)} 
acquire(b.1); 
{obs(Ow{b.v, b.l}) « IWt, Ot. locked(b.1, We, Ot) « inv(b)( Wt, Ot)} 
b.r:=b.r— 1; 
if(b.r=0){ 
notifyAll (b.v); 
{obs(Ow{b.v, b.l} ) x 3 Wt, Ot. locked (b.l, Wt[b.v:=0], Ot) 
xinv(b)( Wt[b.v:=0], Ot— {b.v} )} g-disch (b.v) 
{obs(Ow{b.l}) x 3 Wt, Ot. locked(b.l, Wt, Ot) x inv(b)( Wt, Ot) }} 
else{ 
{obs(Ow {b.v, b.l}) * 3 Wt, Ot. locked (b.l, Wt, Ot) 
xinv(b)( Wt, Ot—{b.v})} g-disch(b.v); 
{obs(Ow{b.1}) x 3 Wt, Ot. locked(b.l, Wt, Ot) « inv(b)( Wt, Ot) } 
while(b.r>0) 
{obs(Ow{b.1}) x 3 Wt, Ot. locked(b.l, Wt, Ot) * inv(b)( Wtw/{b.v}, Ot)} 
wait(b.v, b.l) 
{obs(Ow{b.1}) x 3 Wt, Ot. locked(b.l, Wt, Ot) x inv(b)( Wt, Ot) }}; 
release(b.l) {obs(O) * lock(b.l)}} 


Fig. 8. Verification of a barrier synchronized using a monitor 
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threads that must call the routine wait_for_rest, a lock l protecting r against data 
races, and a condition variable v. Each thread executing the routine wait_for_rest 
first decreases the variable r, and if the resulting value is still positive waits for 
v, otherwise it notifies all threads waiting for v. In this program the barrier is 
initialized to 3, implying that no thread must start task unless all the three 
threads in this program finish task;. This program is deadlock-free because the 
routine wait_for_rest is executed by three different threads. Figure 8 illustrates 
how this program can be verified by the presented proof rules. Note that before 
executing g_disch in the else branch, safe_obs holds because at this point we have 
0 < b.r, which implies 1 < b.r before the execution of b.r := b.r — 1, and by the 
invariant we have 1 < Ot(b.v), implying 0 < (Ot — {b.v})(b.v). The interesting 
point about the verification of this program is that since all the threads waiting 
for condition variable v in this program are notified by the command notifyAll, 
the invariant of the related lock, implying one_ob(b.v, Wt, Ot), is significantly 
different from the ones defined in the channel and sleeping barber examples. 
Generally, for a condition variable v on which only notifyAll is executed (and 
not notify) a lock invariant can imply the invariant one_ob(v, Wt, Ot) if it asserts 
Wt(v) = 0 V S(v) < Ct(v) and Ct(v) < Ot(v) + S(v), where Ct(v) is the total 
number of credits for v and S(v) is an integer value such that the command 
wait(v, l) is executed only if S(v) < 0. For this particular example $(b.v) = 1—b.r 
and Ct(b.v) = 0, since this program can be verified without incorporating the 
notion of credits. 


4 Relaxing the Precedence Relation 


The precedence relation, in this paper denoted by <, introduced in [4] makes 
sure that all threads wait for the waitable objects in strict ascending order (with 
respect to the wait level associated with each waitable object), or here in this 
paper in descending order, ensuring that in any state of the execution there is no 
cycle in the corresponding wait-for graph. However, this relation is too restrictive 
and prevents verifying some programs that are actually deadlock-free, such as 
the one shown in the left side of Fig.9. In this program a value is increased by 
two threads communicating through a channel. Each thread receives a value from 
the channel, increases that value, and then sends it back on the channel. Since an 
initial value is sent on the related channel this program is deadlock-free. The first 
attempt to verify this program is illustrated in the middle part of Fig.9, where 
the required credit to verify the receive command in the routine inc is going to 
be provided by the send command, executed immediately after this command, 
and not by the precondition of this routine. In other words, the idea is to load 
a credit and an obligation for ch in the routine inc itself, and then spend the 
loaded credit to verify the receive(ch) command and fulfil the loaded obligation 
by the send(ch) command. However, this idea fails because the receive command 
in the routine inc cannot be verified since one of its preconditions, ch<{ch}, never 
holds. Kobayashi [6,20] has addressed this problem in his type system by using 
the notion of usages and assigning levels to each obligation/capability, instead of 
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routine main(){ routine main(){ routine main(){ 
ch:=channel; {obs({})} {obs({})} 
send(ch, 12); ch:=newchannel; ch:=newchannel; 
fork (inc(ch)); send(ch, 12); {obs({ch}) A P(ch)=true} 
fork (inc(ch))} fork (inc(ch)); send(ch, 12); 
fork (inc(ch)) {obs({})}} _{obs({})} 
routine inc(channel ch) { fork (inc(ch)); 
d:=receive(ch); routine inc(channel ch) { fork (inc(ch)) {obs({})}} 
send(ch, d+1)} {obs({})} 
{obs({ch}) * credit(ch) routine inc(channel ch) { 
A chA{ch}} {obs({}) A ch={ch}} 
d:=receive(ch); (obs({ch}) * credit(ch) 
{obs({ch})} A ch=<{ch}) 
send(ch, d+1) {obs({})}} d:=receive(ch); 
{obs( {ch})} 


send(ch, d+1) {obs({})}} 


Fig. 9. A deadlock-free program verified by exploiting the relaxed precedence relation 


waitable objects. However, in the next section we provide a novel idea to address 
this problem by just relaxing the precedence relation used in the presented proof 
rules. 


4.1 A Relaxed Precedence Relation 


To tackle the problem mentioned in the previous section we relax the precedence 
relation, enforced by <, by replacing < by x satisfying the following property: 
oO holds if either o<O or (1) o<O — {o}, and (2) o satisfies the property that 
in any execution state, if a thread waits for o then there exists a thread that can 
discharge an obligation for o and is not waiting for any object whose wait level 
is equal to/greater than the wait level of o. This property still guarantees that in 
any state of the execution if the program has some threads suspended, waiting for 
some obligations, there is always a thread obliged to fulfil the obligation Omin 
that is not blocked, where Omin has a minimal wait level among all waitable 
objects for which a thread is waiting. 

The condition number 2 is met if it is an invariant that for a condition variable 
o for which a thread is waiting the total number of obligations is greater than the 
total number of waiting threads. Since each thread waiting for o has at most one 
instance of o in the bag of its obligations, according to the pigeonhole principle, 
if the number obligations for o is higher than the number of threads waiting for 
o then there exists a thread that holds an obligation for o that is not waiting for 
o, implying the rule number 2 because this thread only waits for objects whose 
wait levels are lower than the wait level of o. Accordingly, we first introduce a 
new function P in the proof rules mapping each waitable object to a boolean 
value, and then make sure that for any object o for which a thread is waiting if 
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P(o) = true then Wt(o) < Ot(o). With the help of this function we define the 
relaxed precedence relation as shown in Definition 1. 


Definition 1 (Relaxed precedence relation). The relaxed precedence rela- 
tion indexed over functions R and P holds for a waitable object v and a bag of 
obligations O, denoted by v x O, if and only if: 


vxO V (u~xO — {uv} A P(v) = true) , where vXO & Vo € O. R(v) < R(o) 


Using this relaxed precedence relation the approach presented by Leino et al. [4] 
can also support more complex programs, such as the one in the left side of Fig. 9. 
This approach can exploit this relation by (1) replacing the original precedence 
relation < by the relaxed one =, and (2) replacing the rule associated with creating 
a channel by the one shown below. According to this proof rule for each channel 
ch the function P, in the definition of the relaxed precedence relation, is initialized 
when ch is created such that if P(ch) is decided to be true then one obligation for 
ch is loaded onto the bag of obligations of the creating thread. The approach is 
still sound because for any channel ch for which P is true the invariant Wt(ch) + 
Ct(ch) < Ot(ch) +sizeof(ch) holds. Combined with the fact that in this language, 
where channels are primitive constructs, Wt(ch) > 0 => sizeof(ch) = 0, we have 
Wt(ch) > 0 = Wet(ch) < Ot(ch). Now consider a deadlocked state, where each 
thread is waiting for a waitable object. Among all of these waitable objects take 
the one having a minimal wait level, namely om. If Om is a lock or a channel, where 
P(0m) = false, then at least one thread has an obligation for Oom and is waiting for 
an object o whose wait level is lower that the wait level of om, which contradicts 
minimality of the wait level of om. Otherwise, since Wt (om) > 0 we have Wt(0m) < 
Ot(0m). Additionally, we know that each thread waiting for o,, has at most one 
obligation for om. Accordingly, there must be a thread holding an obligation for om 
that is not waiting for Om. Consequently, this thread must be waiting for an object 
o whose wait level is lower than the wait level of om, which contradicts minimality 
of the wait level of om. 


{obs(O)} newchannel {Ach. obs(O’) A R(ch) = z A P(ch) =b 
A((b = false A O' = O) v (b= true A O! = Ow{ch}))} 


To exploit the relaxed definition in the approach presented in this paper we 
only need to make sure that for any condition variable v for which a thread is 
waiting if P(v) is true then Ot(v) is greater than Wt(v). To achieve this goal 
we include this invariant in the definition of the invariant safe_obs, shown in 
Definition 2, an invariant that must hold when a command wait or a ghost 
command g_disch is executed. 


Definition 2 (Safe Obligations). The relation safe_obs(v, Wt, Ot), indexed 
over function P, holds if and only if: 


one_ob(v, Wt, Ot) A (P(v) = true = spare_ob(v, Wt, Ot)), where 


one_ob(v, Wt, Ot) = (Wt(v) > 0 > Ot(v) > 0) 
spare_ob(v, Wt, Ot) = (Wt(v) > 0 > Wt(v) < Ot(v)) 
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one_ob(v, Wt, Ot) A 
one_ob(v, Wt, Ot) = 
spare_ob(v, Wt, Ot) = 


(P(v)=true = spare_ob(v, Wt, Ot)), where 
(Wt(v)>0 > Ot(v)>0) 
(Wt(v)>0 = Wt(v)<Ot(v)) 


routine main(){ 
aw:=newint(0); 
ww:=newint(0); 
ar:=newint(0); 


routine reader(rdwr b){ 

acquire(b.1); acquire(b.1); 

while(b.aw+b.ww>0) while(b.aw+b.ar>0){ 
wait(b.ur, b.l); b.ww:=b.ww+1; 


routine writer(rdwr b){ 


1:=newlock; b.ar:=b.ar+1; wait(b.vw, b.l); 
Uw:=newcond; release(b.1); if(b.ww<1) 
Upi=newcond; // Perform reading ... abort(); 


b := rdwr(aw, ww acquire(b.1); b.ww:=b.ww-1 


„ar, l, Vw, Ur); if(b.ar<1) }; 
fork( abort; b.aw:=b.aw+1; 
while (true) b.ar:=b.ar—1; release(b.1); 
fork(reader(b)) notify(b.vw); // Perform writing ... 
) release(b.1) } acquire(b.1); 
while (true) if(b.aw41) 
fork(writer(b)) abort; 
} b.aw:=b.aw—1; 


notify(b.vw); 

if (b.ww=0) 
notifyAll(b.u,); 

release(b.1) } 


Fig. 10. A readers-writers program with variables aw, holding the number of threads 
writing, ww, holding the number of thread waiting to write, and ar, holding the number 
of threads reading, that is synchronized using a monitor consisting of condition variables 
Vw, preventing writers from writing while other threads are reading or writing, and vr, 
preventing readers from reading while there is another thread writing or waiting to 
write. 


Readers-Writes Locks. As another application of this relaxed definition con- 
sider a readers-writers program, shown in Fig. 104, where the condition variable 
Uw prevents writers from writing on a shared memory when that memory is being 
accessed by other threads. After reading the shared memory, a reader thread noti- 
fies this condition variable if there is no other thread reading that memory. This 
condition variable is also notified by a writer thread when it finishes its writing. 
Consequently, a writer thread first might wait for v, and then fulfil an obliga- 
tion for this condition variable. This program is verified if the writer thread itself 
produces a credit and an obligation for v,, and then uses the former for the com- 
mand wait(v.,,/) and fulfils the latter at the end of its execution. Accordingly, since 
when the command wait(vw, l) is executed vw is in the bag of obligations of the 


4 The abort commands in this program can be eliminated using the ghost counters from 
Fig. 6. However, we leave them in for simplicity. 
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inv(rdwr b) := A Wt. AOt. ACtw. ctr(b.cw, Ctw) * 


daw >0, ww>0, ar20. b.aweaw * b.wwreww * b.arrear ^ 


L(b.vw)=L(b.ur)=b.1 A M(b.vw)=tic(b.cw) A M(b.ur)=true A P(vw)=true A P(vr)=false A 


(Wt(b.vr) = 0 V 0 < aw + ww) A 

aw + ww < Ot(b.v;) A 

Wt(b.vw) + Ctw + aw +ar < Ot(b.vw) A 
(Wt(b.vw) =O0V Wt(b.vw) < Ot(b.vw)) 


routine main(){ 

aw:=newint(0); ww:=newint(0); 
ar:=newint(0); l:=newlock; 
Vwi=newcond; v,:=newcond; 

b := rdwr(aw, ww, ar, l, vw, Ur); 
b.cw:=g_newctr; 

{obs({}) * inv(b) (f, {}) * ulock(l, {}, fF) * 
L(vw)=L(ur)=l A M(vw)=tic(b.cw) A 
M (vr )=true A R(J)=0 A R(vw)=1 A 
R(t =Z A L(vw)=l A L(v,)=1 

A P(vw)=true A P(v,)=false} g_initl(/); 
{obs({}) * lock(/) A I(Z)=inv(6)} 

fork( {obs({}) * lock(l)} 

while (true) fork(reader(b))); 

{obs({}) * lock(1) } 

while (true) fork(writer(b)) 

{obs({}) + lock(1)}} 


routine reader(rdwr b){ 
{obs(O) * lock(b.1) A 6.13 OW{b.vw} 
A bvp<O A I(b.l)=inv(b)} 
acquire(b.l); 
while(b.aw+b.ww>0) 

wait(b.u,, b.l); 
b.ar:=b.ar+1; 
g_chrg(b.vw); 
release(b.1); 
// Perform reading ... 
acquire(b.1); 
if(b.ar<1) 

abort; 
b.ar:=b.ar—1; 
if (Wt(b.vw) > 0) g-inc(b.cw); 
notify(b.vw ); 
g-disch(b.vw ); 
release(b.1) {obs({}) * lock(b.1)}} 


routine writer(rdwr b){ 
{obs(O) x lock(b.1) A b.lxOW{b.vw, bur} 
A b.UwXOW{b.Uw, bur} A I(b.l)=inv(b)} 
acquire(b.l); 
g_chrg(b.vw); g_inc(b.cw); 
g-chrg(b.vr); 
while(b.aw+b.ar>0){ 

g-dec(b.cw); 

b.ww:=b.ww+1; 

wait(b.vw, b.1); 

if(b.ww<1) 

abort(); 

b.ww:=b.ww—1 
}; 
b.aw:=b.aw+1; 
g_dec(b.cw); 
release(b.1); 
// Perform writing ... 
acquire(b.1); 
if(b.awA1) 

abort; 
b.aw:=b.aw-1; 
if (Wt(b.vw) > 0) g_inc(b.cw ); 
notify (b.vw); 
if (b.ww=0) 

notifyAll(b.v; ); 
g-disch(b.vw); g_disch(b.v,); 
release(b.1) {obs({}) * lock(b./)}} 


Fig. 11. Verification of the program in Fig. 10 
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writer thread, this command can be verified if v,<{vw}, where P(v,,) must be true. 
The verification of this program is illustrated in Fig. 11. Generally, for a condi- 
tion variable v for which P(v) = true a lock invariant can imply the invariant 
one_ob(v, Wt, Ot) if it asserts Wt(v) + Ct(v) < Ot(v) + S(v) and Wt(v) = 
OV Wt(v) < Ot(v), where Ct(v) is the total number of credits for v and S(v) is 
an integer value such that wait(v,1/) is executed only if S(v) < 0. 


4.2 A Further Relaxation 


The relation x allows one to verify some deadlock-free programs where a thread 
waits for a condition variable while that thread is also obliged to fulfil an obliga- 
tion for that variable. However, it is still possible to have a more general, more 
relaxed definition for this relation. Under this definition a thread with obliga- 
tions O is allowed to wait for a condition variable v if either v<O, or there exists 
an obligation o such that (1) v<O — {o}, and (2) o satisfies the property that in 
any execution state, if a thread is waiting for o then there exists a thread that 
is not waiting for any waitable object whose wait level is equal to/greater than 
the wait levels of v and o. This new definition still guarantees that in any state 
of the execution if the program has some threads suspended, waiting for some 
obligations, there is always a thread obliged to fulfil the obligation Omin that is 
not suspended, where Omin has a minimal wait level among all waitable objects 
for which a thread is waiting. To satisfy the condition number 2 we introduce a 
new definition for x, shown in Definition 3, that uses a new function X mapping 
each lock to a set of wait levels. This definition will be sound only if the proof 
rules ensure that for any condition variable v whose wait level is in X(L(v)) the 
number of obligations is equal to or greater than the number of the waiting 
threads. 

This definition is still sound because of Lemma 1, that has been machine- 
checked in Coq’, where G is a bag of waitable object-bag of obligations pairs 
such that each element t of G is associated with a thread in a state of the 
execution, where the first element of t is the object for which t is waiting and 
the second element is the bag of obligations of t. This lemma implies that if 
all the mentioned rules, denoted by Hı to H4, are respected in any state of 
the execution then it is impossible that all threads in that state are waiting 
for a waitable object. This lemma can be proved by induction on the number 
of elements of G and considering the element waiting for an object whose wait 
level is minimal (see [16] representing its proof in details). 


Definition 3 (Relaxed precedence relation). The new precedence relation 
indexed over functions R,L,P,X holds for a waitable object v and a bag of obli- 
gations O, denoted by v x O, if and only if: 


5 The machine-checked proof can be found at https://github.com/jafarhamin/ 
deadlock-free- monitors-soundness. 
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(v<xO Vv vXO) A (mexc(v) VuLO), where 
vx~O & Vo E O. R(v) < R(o) 
v0 = P(v) = true A exc(v) A 
do. v<xO — {o} A R(v) < R(o) + 1 AL(w) = L(o) A exc(o) 
exc(v) = R(v) € X(L(v)) 
vLO = let Ox = Av’. Oe) are ) E in 
0 otherwise 


|Oz| <1 
Ww. Ox(v’) > 0 > L(v’) = L(v) 


Lemma 1 (A Valid Graph Is Not Deadlocked) 
V G:Bags( WaitObjs x Bags( WaitObjs)), R: WaitObjs— WaitLevels, 
L: WaitObjs— Locks, P: WaitObjs— Bools, X:Locks—Sets( WaitLevels). 
Hı A Hy \ H3 ^A Hy > G = {}, where 
Hı : Y(0,0) € G. 0 < Ot(o) 
V(o,O) € G. P(o) = true > Wt(o0) < Ot(o) 
H; : Y(0,0) € G. R(o) € X(L(0)) = Wt(o) < Ot(o) 
Y(o, O) EG. OSR, L,P,XO 


where Wt= W {o} and Ot= w O 
(0,0)EG (0,0)EG 


NEWLOCK {true} newlock {Al. ulock(l, {}, {}) A R0) =z A X(I)=X} 


NEWCv {true} newcond {Av. R(v)=z A L(v)=l A M(v)=m A P(v)=b} 


Fig. 12. New proof rules initializing functions X and P used in safe_obs and = 


To 

extend the proof rules with the new precedence relation it suffices to include 
a new invariant own_ob in the definition of safe_obs, as shown in Definition 4, an 
invariant that must hold when a command wait or a ghost command g_disch is 
executed, to make sure that for any condition variable for which exc holds, the 
number of obligations is equal to/greater than the number of the waiting threads. 
Additionally, the functions X and P, as indicated in Fig. 12, are initialized when 
a lock and a condition variable is created, respectively. The rest of the proof rules 
are the same as those defined in Fig. 5 except that the old precedence relation 
(<) is replaced by the new one (=). 


Definition 4 (Safe Obligations). The relation safe_obs(v, Wt, Ot), indexed 
over functions R,L,P,X, holds if and only if: 


one_ob(v, Wt, Ot) A (P(v) = true = spare_ob(v, Wt, Ot)) A 
(exc(v) = true => own_ob(v, Wt, Ot)), where 

one_ob(v, Wt, Ot) = (Wt(v) > 0 = Ot(v) > 0) 

spare_ob(v, Wt, Ot) = ( Wt(v) 0 => Wt(v) < Ot(v)) 
own-ob(v, Wt, Ot) = (Wt(v) < Ot(v)) 
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Bounded Channels. One application of the new definition is a bounded chan- 
nel program, shown in Fig. 13, where a sender thread waits for a receiver thread 
if the channel is full, synchronized by vy, and a receiver thread waits for a sender 
thread if the channel is empty, synchronized by ve. More precisely, the sender 
thread with an obligation for ve might execute the command wait(vy,/), and the 
receiver thread with an obligation for vf might execute a command wait(ve, l). 


routine main(){ routine send(channel ch, int d) routine receive(channel ch) 
q := newqueue; 

l := newlock; acquire(ch.1); acquire(ch.l); 

Uf := newcvar; while(sizeof (ch.q) = max) while(sizeof (ch.q) = 0) 

Ue := newcvar; wait(ch.v¢, ch.l); wait(ch.ve, ch.l); 
ch:=channel(q,l, vf, Ve); enqueue(ch.q, d); dequeue(ch.q); 

fork (receive(ch)); notify(ch.ve); notify(ch.v¢); 

send(ch, 12)} release(ch.1)} release(ch.l)} 


inv(channel ch) := AWt. AOt. ACte, Ctf. ctr(ch.ce, Cte) * ctr(ch.cy, Ctf) * 

ds. queue(ch.g,s) A P(ve)=false A M(ve)=tic(ch.ce) A M(vz)=tic(ch.cy) land 
L(ch.ve)=L(ch.v¢)=ch.l A 

Wt(ch.ve) + Cte < Ot(ch.ve) +s A Wt(ch.ve) < Ot(ch.ve) A 

Wt(ch.vy) + Ctf +5 < Ot(ch.vy) + max A (Wit(vs) =0V Wt(ch.vy¢) < Ot(ch.vy)) 


routine main(){ routine send(channel ch,int d) routine receive(channel ch){ 

q := newqueue; {{obs(OW{ch.ve}) * tic(ch.cf)* {obs(Ow{ch.vs}) * tic(ch.ce) * 
l := newlock; lock(ch.l) A ch.lxOW{ch.ve} A lock(ch.l) A ch.lxOw{ch.vs} A 
uy := newcvar; ch.vpxOW{ch.ve} Al(ch.l)=inv}  ch.vexOW{ch.vs}Al(ch.l)=inv} 
Ue := newcvar; acquire(ch.l); acquire(ch.l); 
ch:=channel(q,l, vf, ve); while(sizeof (ch.q) = max){ while(sizeof (ch.q) = 0){ 
ch.ce:=g_newctr; g_dec(ch.cy); g_dec(ch.ce); 
ch.c¢:=g_newctr; wait(ch.ur, ch.l)}; wait(ch.ve, ch.l)}; 
g_inc(ch.ce); enqueue(ch.q, d); dequeue(ch.q); 

g-inc(ch.c+); if (Wt(b.ve) > 0) if (Wt(b.vs) > 0) 

g_chrg(ve); g_chrg(vs); g_inc(b.ce); g-inc(b.cy); 

g-initl(1); notify(ch.ve); notify(ch.v¢); 

{obs({ve, vp}) * lock(l) * g_disch(ch.ve); g_disch(ch.vy); 

tic(ch.ce) * tic(ch.cf) *  g_dec(ch.cr); g_dec(ch.ce); 

L(vs)=1 A L(ve)=1 A release(ch.1) release(ch.1) 

M(ve)=tic(ch.ce) A {obs(O) x lock(ch.l)}} {obs(O) x lock(ch.l)}} 
M(up)=tic(ch.cp) A 


P(vy)=true A 
P(ve)=false ^ 

R(1)=0 A 

R(ve)=1 A R(vf)=2 A 
X(1)={1, 2} A I(l)=inv} 
fork (receive(ch)); 
send(ch, 12) {obs({})}} 


Fig. 13. Verification of a bounded channel synchronized using a monitor consisting of 
condition variables vy, preventing sending on a full channel, and ve, preventing taking 
messages from an empty channel 
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Since ve and vr are not equal, it is impossible to verify this program by the old 
definition of =~ because the waiting levels of ve and vf cannot be lower than 
each other. Thanks to the new definition of <, this program can be verified, as 
shown in Fig. 13, by initializing P(vy) with true and X(l) with {1,2}, where two 
consecutive numbers 1 and 2 are the wait levels of ve and vy, respectively. 


5 Soundness Proof 


In this section we provide a soundness proof for the present approach®, i.e. 
if a program is verified by the proposed proof rules, where the verification 
starts from an empty bag of obligations and also ends with such bag, this 
program is deadlock-free. To this end, we first define the syntax of programs 
and a small-step semantics for programs (~~) relating two configurations (see 
[16] for formal definitions). A configuration is a thread table-heap pair (t, h), 
where heaps and thread tables are some partial functions from locations and 
thread identifiers to integers and command-context pairs (c; £), respectively, 
where a context, denoted by €, is either done or let x:=|] in c. Then we 
define validity of configurations, shown in Definition 5, and prove that (1) if 
a program c is verified by the proposed proof rules, where it starts from the 
precondition obs({}) and satisfies the post condition A_.obs({}), then the ini- 
tial configuration, where the heap is empty, denoted by 0 = »_.@, and there is 
only one thread with command c and context done, is a valid configuration 
(Theorem 4), (2) a valid configuration is not deadlocked (Theorem 5), and 
(3) starting from a valid configuration, all the subsequent configurations of the 
execution are also valid (Theorem 6). 

Ina valid configuration (t, h), h contains all the heap ownerships that are in pos- 
session of all threads in ¢ and also those that are in possession of the locks that are 
not held, specified by a list A. Additionally, each thread must have all the required 
permissions to be successfully verified with no remaining obligation, enforced by 
wpcx. wpcx(c, €) in this definition is a function returning the weakest precondition 
of the command c with the context € w.r.t. the postcondition \_.obs({}) (see [16] 
for formal definitions). This function is defined with the help of a function wp(c, a) 
returning the weakest precondition the command c w.r.t. the postcondition a. 


Definition 5 (Validity of Configurations). A configuration is valid, denoted 
by valid(t,h), if there exist a list of augmented threads T, consisting of an 
identifier (id), a program (c), a context (€), a permission heap (p), a ghost 
resource heap (C) and a bag of obligations (O) associated with each thread; a list 
of assertions A, and some functions R,I,L,M,P,X such that: 


= Vid, GE t(id) = (c; £) > dp, O, C. (id, C, E, p, O, C) E T 


n h = pheap2heap( * a * (id cÊ 20 ae 


6 The machine-checked version of some lemmas and theorems in this proof, such as 
Theorems 4 and 5, can be found at https://github.com/jafarhamin/deadlock-free- 
monitors-soundness. 
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- V(id,c,&,p,O,C) ET. 

p, O, C E wpcxp,r L, m,P,x (6 £) 

Vl, Wt, Ot. p(l) = Ulock/Locked( Wt, Ot) > Wt = Wt; A Ot = Ot, 

VI. p(l) = Lock A h(l) = 1 = I(2) (Wt), Ot;) € A 

Vl. p(l) = Lock V p(l) = Locked(Wt;, Ot;) = —P(1l) Anexc(1) A (A(1) = 0 => 
le Ot) 

e Vo. waiting-for(c, h) = o => safe_obspr, 1, p,x (0, Wt, Ot) 


where 


e O= W „Wt= © Wo {o} 
(id,c,€,p,0,C)ET (id,c,€,p,0,C)ET Awaiting_for(c,h)=o 

e O; is a bag that given an object o returns O(0) if L(o) =l and 0 if L(o) 41 

e waiting for(c,h) returns the object for which c is waiting, if any 

e pheap2heap(p) returns the heap corresponding with permission heap p 


We finally prove that for each proof rule {a} c {a’} we have a > wp(c, a’). To 
this end, we first define correctness of commands, shown in Definition 6, and then 
for each proof rule {a} c {a’} we prove correct(a,c, a’). In addition to the proof 
rules presented in this paper, other useful rules such as the rules consequence, 
frame and sequential, shown in Theorems 1, 2, and 3 can also be proved with 
the help of some auxiliary lemmas in [16]. Note that the indexes R, I, L, M, P, X 
are omitted when they are unimportant. 


Definition 6 (Correctness of Commands) 
correct p,1,1,M,P,X (a,c, a) (a > WPR I L,m,P,x (6 a')) 
Theorem 1 (Rule Consequence) 
correct(a1, c€, a2) A (a, => a1) A (Vz. a(z) => a3(z)) = correct (a4, c, a5) 
Theorem 2 (Rule Frame) 
correct(a,c, a’) = correct(a* f,c, Az. a' (z) * f) 
Theorem 3 (Rule Sequential Composition) 


correct(a, c1, a’) A (Vz. correct(a’(z), ca[z/a], a”)) > 
correct(a, let x:=c) in cp, a”) 


Theorem 4 (The Initial Configuration is Valid) 
correctr,7,1,M,P,x (obs({}), c, A-.obs({})) = valid(O[id:=c; done], 0) 


Proof. The goal is achieved because there are an augmented thread list T = 
[(id, c, done, 0, {},0)], a list of assertions A = [|], and functions R,J,L,M,P,X 
by which all the conditions in the definition of validity of configurations are 
satisfied. 
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Theorem 5 (A Valid Configuration is Not Deadlocked) 


(did, c, €, 0. t(id) = (c; £) A waiting_for(c, h) = o) A valid(t, h) 
=> Jid',cd, E, t(id’) = (d; £) A waiting_for(c’,h) = Ø 


Proof. We assume that all threads in t are waiting for an object. Since (t, h) 
is a valid configuration there exists a valid augmented thread table T with 
a corresponding valid graph G = g(T), where g maps any element such as 
(id, c,€,p,O,C) to a new one such as (waiting_for(c),O). By Lemma 1, we have 
G = {}, implying T = {}, implying t = 0 which contradicts the assumption of 
the theorem. 


Theorem 6 (Steps Preserve Validity of Configurations).’ 
valid(K) A k ~ K’ => valid(K’) 


Proof. By case analysis of the small step relation ~> (see [16] explaining the 
proof of some non-trivial cases). 


6 Related Work 


Several approaches to verify termination [1,21], total correctness [3], and lock 
freedom [2] of concurrent programs have been proposed. These approaches are 
only applicable to non-blocking algorithms, where the suspension of one thread 
cannot lead to the suspension of other threads. Consequently, they cannot be 
used to verify deadlock-freedom of programs using condition variables, where 
the suspension of a notifying thread might lead a waiting thread to be infinitely 
blocked. In [22] a compositional approach to verify termination of multi-threaded 
programs is introduced, where rely-guarantee reasoning is used to reason about 
each thread individually while there are some assertions about other threads. 
In this approach a program is considered to be terminating if it does not have 
any infinite computations. As a consequence, it is not applicable to programs 
using condition variables because a waiting thread that is never notified cannot 
be considered as a terminating thread. 

There are also some other approaches addressing some common synchroniza- 
tion bugs of programs in the presence of condition variables. In [8], for example, 
an approach to identify some potential problems of concurrent programs con- 
sisting waits and notifies commands is presented. However, it does not take the 
order of execution of theses commands into account. In other words, it might 
accept an undesired execution trace where the waiting thread is scheduled before 
the notifying thread, that might lead the waiting thread to be infinitely sus- 
pended. [9] uses Petri nets to identify some common problems in multithreaded 
programs such as data races, lost signals, and deadlocks. However the model 
introduced for condition variables in this approach only covers the communi- 
cation of two threads and it is not clear how it deals with programs having 


T The proof of this theorem has not been machine-checked with Coq yet. 
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more than two threads communicating through condition variables. Recently, 
[10] has introduced an approach ensuring that every thread synchronizing under 
a set of condition variables eventually exits the synchronization block if that 
thread eventually reaches that block. This approach succeeds in verifying one of 
the applications of condition variables, namely the buffer. However, since this 
approach is not modular and relies on a Petri net analysis tool to solve the termi- 
nation problem, it suffers from a long verification time when the size of the state 
space is increased, such that the verification of a buffer application having 20 
producer and 18 consumer threads, for example, takes more than two minutes. 

Kobayashi [6,20] proposed a type system for deadlock-free processes, ensur- 
ing that a well-typed process that is annotated with a finite capability level is 
deadlock free. He extended channel types with the notion of usages, describ- 
ing how often and in which order a channel is used for input and output. For 
example, usage of x in the process «?y|x!1|a!2, where ?,!,| represent an input 
action, an output action, and parallel composition receptively, is expressed by 
?\!|!, which means that x is used once for input and twice for output possibly in 
parallel. Additionally, to avoid circular dependency each action a is associated 
with the levels of obligation o and capabilities c, denoted by a2, such that (1) an 
obligation of level n must be fulfilled by using only capabilities of level less than 
n, and (2) for an action of capability level n, there must exist a co-action of obli- 
gation level less than or equal to n. Leino et al. [4] also proposed an approach to 
verify deadlock-freedom of channels and locks. In this approach each thread try- 
ing to receive a message from a channel must spend one credit for that channel, 
where a credit for a channel is obtained if a thread is obliged to fulfil an obli- 
gation for that channel. A thread can fulfil an obligation for a channel if either 
it sends a message on that channel or delegate that obligation to other thread. 
The same idea is also used to verify deadlock-freedom of semaphores [7], where 
acquiring (i.e. decreasing) a semaphore consumes one credit and releasing (i.e. 
increasing) that semaphore produces one credit for that semaphore. However, as 
it is acknowledged in [4], it is impossible to treat channels (and also semaphores) 
like condition variables; a wait cannot be treated like a receive and a notify can- 
not be treated like a send because a notification for a condition variable will be 
lost if no thread is waiting for that variable. We borrow many ideas, including 
the notion of obligations/credits(capabilities) and levels, from these works and 
also the one introduced in [11], where a corresponding separation logic based 
approach is presented to verify total correctness of programs in the presence of 
channels. 


7 Conclusion 


It this article we introduced a modular approach to verify deadlock-freedom of 
monitors. We also introduced a relax, more general precedence relation to avoid 
cycles in the wait-for graph of programs, allowing a verification approach to verify 
a wider range of deadlock-free programs in the presence of monitors, channels and 
other synchronization mechanisms. 
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Abstract. A major challenge in automated verification is to develop 
techniques that are able to reason about fine-grained concurrent algo- 
rithms that consist of an unbounded number of concurrent threads, which 
operate on an unbounded domain of data values, and use unbounded 
dynamically allocated memory. Existing automated techniques consider 
the case where shared data is organized into singly-linked lists. We 
present a novel shape analysis for automated verification of fine-grained 
concurrent algorithms that can handle heap structures which are more 
complex than just singly-linked lists, in particular skip lists and arrays of 
singly linked lists, while at the same time handling an unbounded number 
of concurrent threads, an unbounded domain of data values (including 
timestamps), and an unbounded shared heap. Our technique is based on 
a novel shape abstraction, which represents a set of heaps by a set of 
fragments. A fragment is an abstraction of a pair of heap cells that are 
connected by a pointer field. We have implemented our approach and 
applied it to automatically verify correctness, in the sense of linearizabil- 
ity, of most linearizable concurrent implementations of sets, stacks, and 
queues, which employ singly-linked lists, skip lists, or arrays of singly- 
linked lists with timestamps, which are known to us in the literature. 


1 Introduction 


Concurrent algorithms with an unbounded number of threads that concurrently 
access a dynamically allocated shared state are of central importance in a large 
number of software systems. They provide efficient concurrent realizations of 
common interface abstractions, and are widely used in libraries, such as the 
Intel Threading Building Blocks or the java.util.concurrent package. They 
are notoriously difficult to get correct and verify, since they often employ fine- 
grained synchronization and avoid locking when possible. A number of bugs 
in published algorithms have been reported [13,30]. Consequently, significant 
research efforts have been directed towards developing techniques to verify cor- 
rectness of such algorithms. One widely-used correctness criterion is that of 
linearizability, meaning that each method invocation can be considered to occur 
atomically at some point between its call and return. Many of the developed ver- 
ification techniques require significant manual effort for constructing correctness 
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proofs (e.g., [25,41]), in some cases with the support of an interactive theo- 
rem prover (e.g., [11,35,40]). Development of automated verification techniques 
remains a difficult challenge. 

A major challenge for the development of automated verification techniques is 
that such techniques must be able to reason about fine-grained concurrent algo- 
rithms that are infinite-state in many dimensions: they consist of an unbounded 
number of concurrent threads, which operate on an unbounded domain of data 
values, and use unbounded dynamically allocated memory. Perhaps the hardest 
of these challenges is that of handling dynamically allocated memory. Conse- 
quently, existing techniques that can automatically prove correctness of such 
fine-grained concurrent algorithms restrict attention to the case where heap 
structures represent shared data by singly-linked lists [1,3,18,36,42]. Further- 
more, many of these techniques impose additional restrictions on the considered 
verification problem, such as bounding the number of accessing threads [4, 43, 45]. 
However, in many concurrent data structure implementations the heap repre- 
sents more sophisticated structures, such as skiplists [16,22,38] and arrays of 
singly-linked lists [12]. There are no techniques that have been applied to auto- 
matically verify concurrent algorithms that operate on such data structures. 


Contributions. In this paper, we present a technique for automatic verification 
of concurrent data structure implementations that operate on dynamically allo- 
cated heap structures which are more complex than just singly-linked lists. Our 
framework is the first that can automatically verify concurrent data structure 
implementations that employ singly linked lists, skiplists [16,22,38], as well as 
arrays of singly linked lists [12], at the same time as handling an unbounded 
number of concurrent threads, an unbounded domain of data values (including 
timestamps), and an unbounded shared heap. 

Our technique is based on a novel shape abstraction, called fragment abstrac- 
tion, which in a simple and uniform way is able to represent several different 
classes of unbounded heap structures. Its main idea is to represent a set of 
heap states by a set of fragments. A fragment represents two heap cells that are 
connected by a pointer field. For each of its cells, the fragment represents the 
contents of its non-pointer fields, together with information about how the cell 
can be reached from the program’s global pointer variables. The latter informa- 
tion consists of both: (i) local information, saying which pointer variables point 
directly to them, and (ii) global information, saying how the cell can reach to and 
be reached from (by following chains of pointers) heap cells that are globally sig- 
nificant, typically since some global variable points to them. A set of fragments 
represents the set of heap states in which any two pointer-connected nodes is 
represented by some fragment in the set. Thus, a set of fragments describes the 
set of heaps that can be formed by “piecing together” fragments in the set. The 
combination of local and global information in fragments supports reasoning 
about the sequence of cells that can be accessed by threads that traverse the 
heap by following pointer fields in cells and pointer variables: the local infor- 
mation captures properties of the cell fields that can be accessed as a thread 
dereferences a pointer variable or a pointer field; the global information also 
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captures whether certain significant accesses will at all be possible by follow- 
ing a sequence of pointer fields. This support for reasoning about patterns of 
cell accesses enables automated verification of reachability and other functional 
properties. 

Fragment abstraction can (and should) be combined, in a natural way, 
with data abstractions for handling unbounded data domains and with thread 
abstractions for handling an unbounded number of threads. For the latter we 
adapt the successful thread-modular approach [5], which represents the local 
state of a single, but arbitrary thread, together with the part of the global state 
and heap that is accessible to that thread. Our combination of fragment abstrac- 
tion, thread abstraction, and data abstraction results in a finite abstract domain, 
thereby guaranteeing termination of our analysis. 

We have implemented our approach and applied it to automatically verify 
correctness, in the sense of linearizability, of a large number of concurrent data 
structure algorithms, described in a C-like language. More specifically, we have 
automatically verified linearizability of most linearizable concurrent implementa- 
tions of sets, stacks, and queues, and priority queues, which employ singly-linked 
lists, skiplists, or arrays of timestamped singly-linked lists, which are known to 
us in the literature on concurrent data structures. For this verification, we spec- 
ify linearizability using the simple and powerful technique of observers [1,7,9], 
which reduces the criterion of linearizability to a simple reachability property. 
To verify implementations of stacks and queues, the application of observers can 
be done completely automatically without any manual steps, whereas for imple- 
mentations of sets, the verification relies on light-weight user annotation of how 
linearization points are placed in each method [3]. 

The fact that our fragment abstraction has been able to automatically verify 
all supplied concurrent algorithms, also those that employ skiplists or arrays of 
SLLs, indicates that the fragment abstraction is a simple mechanism for cap- 
turing both the local and global information about heap cells that is neces- 
sary for verifying correctness, in particular for concurrent algorithms where an 
unbounded number of threads interact via a shared heap. 


Outline. In the next section, we illustrate our fragment abstraction on the ver- 
ification of a skiplist-based concurrent set implementation. In Sect. 3 we intro- 
duce our model for programs, and of observers for specifying linearizability. In 
Sect.4 we describe in more detail our fragment abstraction for skiplists; note 
that singly-linked lists can be handled as a simple special case of skiplists. In 
Sect.5 we describe how fragment abstraction applies to arrays of singly-linked 
lists with timestamp fields. Our implementation and experiments are reported 
in Sect. 6, followed by conclusions in Sect. 7. 


Related Work. A large number of techniques have been developed for represent- 
ing heap structures in automated analysis, including, e.g., separation logic and 
various related graph formalisms [10, 15,47], other logics [33], automata [23], or 
graph grammars [19]. Most works apply these to sequential programs. 
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Approaches for automated verification of concurrent algorithms are limited 
to the case of singly-linked lists [1,3, 18,36,42]. Furthermore, many of these tech- 
niques impose additional restrictions on the considered verification problem, such 
as bounding the number of accessing threads [4, 43, 45]. 

In [1], concurrent programs operating on SLLs are analyzed using an adapta- 
tion of a transitive closure logic [6], combined with tracking of simple sortedness 
properties between data elements; the approach does not allow to represent pat- 
terns observed by threads when following sequences of pointers inside the heap, 
and so has not been applied to concurrent set implementations. In our recent 
work [3], we extended this approach to handle SLL implementations of con- 
current sets by adapting a well-known abstraction of singly-linked lists [28] for 
concurrent programs. The resulting technique is specifically tailored for singly- 
links. Our fragment abstraction is significantly simpler conceptually, and can 
therefore be adapted also for other classes of heap structures. The approach 
of [3] is the only one with a shape representation strong enough to verify con- 
current set implementations based on sorted and non-sorted singly-linked lists 
having non-optimistic contains (or lookup) operations we consider, such as the 
lock-free sets of HM [22], Harris [17], or Michael [29], or unordered set of [48]. 
As shown in Sect.6, our fragment abstraction can handle them as well as also 
algorithms employing skiplists and arrays of singly-linked lists. 

There is no previous work on automated verification of skiplist-based concur- 
rent algorithms. Verification of sequential algorithms have been addressed under 
restrictions, such as limiting the number of levels to two or three [2,23]. The 
work [34] generates verification conditions for statements in sequential skiplist 
implementations. All these works assume that skiplists have the well-formedness 
property that any higher-level lists is a sublist of any lower-level list, which is 
true for sequential skiplist algorithms, but false for several concurrent ones, such 
as [22,26]. 

Concurrent algorithms based on arrays of SLLs, and including timestamps, 
e.g., for verifying the algorithms in [12] have shown to be rather challenging. Only 
recently has the TS stack been verified by non-automated techniques [8] using 
a non-trivial extension of forward simulation, and the TS queue been verified 
manually by a new technique based on partial orders [24,37]. We have verified 
both these algorithms automatically using fragment abstraction. 

Our fragment abstraction is related in spirit to other formalisms that abstract 
dynamic graph structures by defining some form of equivalence on its nodes 
(e.g., [23,33,46]). These have been applied to verify functional correctness fine- 
grained concurrent algorithms for a limited number of SLL-based algorithms. 
Fragment abstraction’s representation of both local and global information 
allows to extend the applicability of this class of techniques. 


2 Overview 


In this section, we illustrate our technique on the verification of correctness, in 
the sense of linearizability, of a concurrent set data structure based on skiplists, 
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namely the Lock-Free Concurrent Skiplist from [22, Sect. 14.4]. Skiplists provide 
expected logarithmic time search while avoiding some of the complications of tree 
structures. Informally, a skiplist consists of a collection of sorted linked lists, each 
of which is located at a level, ranging from 1 up to a maximum value. Each skiplist 
node has a key value and participates in the lists at levels 1 up to its height. 
The skiplist has sentinel head and tail 
nodes with maximum heights and key val- 
ues —oo and +00, respectively. The lowest- 
level list (at level 1) constitutes an ordered 
list of all nodes in the skiplist. Higher-level 
lists are increasingly sparse sublists of the 
lowest-level list, and serve as shortcuts into 
lower-level lists. Figure 1 shows an example |-» 3 5 7 +o 
of a skiplist of height 3. It has head and 

tail nodes of height 3, two nodes of height Fig. 1. An example of skiplist 

2, and one node of height 1. 

The algorithm has three main methods, namely add, contains and remove. 
The method add(x) adds x to the set and returns true iff x was not already in 
the set; remove(x) removes x from the set and returns true iff x was in the set; 
and contains(x) returns true iff x is in the set. All methods rely on a method 
find to search for a given key. In this section, we shortly describe the find and 
add methods. Figure 2 shows code for these two methods. 

In the algorithm, each heap node has a key field, a height, an array of 
next pointers indexed from 1 up to its height, and an array of marked fields 
which are true if the node has been logically removed at the corresponding level. 
Removal of a node (at a certain level k) occurs in two steps: first the node is 
logically removed by setting its marked flag at level k to true, thereafter the 
node is physically removed by unlinking it from the level-k list. The algorithm 
must be able to update the next|k] pointer and marked|k] field together as one 
atomic operation; this is standardly implemented by encoding them in a single 
word. The head and tail nodes of the skiplist are pointed to by global pointer 
variables H and T, respectively. The find method traverses the list at decreasing 
levels using two local variables pred and curr, starting at the head and at the 
maximum level (lines 5-6). At each level k it sets curr to pred.next|k] (line 7). 
During the traversal, the pointer variable succ and boolean variable marked are 
atomically assigned the values of curr.next|k] and curr.marked|k], respectively 
(line 9, 14). After that, the method repeatedly removes marked nodes at the 
current level (lines 10 to 14). This is done by using a CompareAndSwap (CAS) 
command (line 11), which tests whether pred.next|k] and pred.marked|k] are 
equal to curr and false respectively. If this test succeeds, it replaces them with 
succ and false and returns true; otherwise, the CAS returns false. During the 
traversal at level k, pred and curr are advanced until pred points to a node 
with the largest key at level k which is smaller than x (lines 15-18). Thereafter, 
the resulting values of pred and curr are recorded into preds/[k] and succs[k] 
(lines 19, 20), whereafter traversal continues one level below until it reaches the 
bottom level. Finally, the method returns true if the key value of curr is equal 
to x; otherwise, it returns false meaning that a node with key x is not found. 


Head Tail 
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struct Node { int key; int height; Node next[]; boolean marked[];} 

boolean find(int x,Node preds[],Node succs[]) boolean add (int x): 

1 boolean marked = false; 1 ant h = randomLevel; 

2 boolean s; 2 Nodex preds[l..h]; succs[1..h] 

3 retry: 3 while (true); 

4 while (true) 4 if find(x,preds,succs) 

5 pred = H; 5 return false; 

6 for (int k MAXLEVEL; k >= 1; k--) 6 else 

7 curr = pred.next[k]; 7 Node» n = new Node(x, h); 

8 while (true) 8 for (int k = 1;k <= h; k++) 

9 <succ, marked> = 9 <n.next[k],n.marked[k]> = 

<curr.next[k], curr.marked[k] >; <succ([k],false>; 

0 while (marked) 0 Nodex pred = preds[1]; 

1 s=CAS (<pred.next [k],pred.marked[k]> $ Node» succ = succs[1]; 

,<curr, false>, <succ, false>); 2 <n.next[1],n.marked[1]>=<succ, false> 

2 if (!s) goto retry; 3 if !CAS(<pred.next[1],pred.marked[1]> 

3 curr = pred.next[k]; © ,<succ, false>,<n,false>); 

4 <succ, marked = 4 goto 3; 

<curr.next[k], curr.marked[k]>; 5 else e 

5 if (curr.key < x) 6 for (int k = 2; k <= h; k++) 

6 pred = curr; 7 while (true); 

7 curr = succ; 8 pred = preds[k]; 

8 else break; 9 succ = succs[k]; 

9 preds[k] = pred; 20 if CAS (<pred.next [k],pred.marked 
20 succs [k] curr; [k]>,<succ, false>, <n, false>) 
21 return (curr.key == x); 21 break; 

22 find (x, preds, succs) ; 


23 return true; 


Fig. 2. Code for 
online) 


the find and add methods of the skiplist algorithm. (Color figure 


The add method uses find to check whether a node with key x is already in 
the list. If so it returns false; otherwise, a new node is created with randomly 
chosen height h (line 7), and with next pointers at levels from 1 to h initialised 
to corresponding elements of succ (line 8 to 9). Thereafter, the new node is 
added into the list by linking it into the bottom-level list between the preds[1] 
and succs[|1] pointers returned by find. This is achieved by using a CAS to make 
preds[1].next[1] point to the new node (line 13). If the CAS fails, the add method 
will restart from the beginning (line 3) by calling find again, etc. Otherwise, 
add proceeds with linking the new node into the list at increasingly higher levels 
(lines 16 to 22). For each higher level k, it makes preds[k].next|k] point to the 
new node if it is still valid (line 20); otherwise find is called again to recompute 
preds[k] and succs|[k] on the remaining unlinked levels (line 22). Once all levels 
are linked, the method returns true. 

To prepare for verification, we add a specification which expresses that the 
skiplist algorithm of Fig. 2 is a linearizable implementation of a set data struc- 
ture, using the technique of observers [1,3,7,9]. For our skiplist algorithm, the 
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user first instruments statements in each method that correspond to lineariza- 
tion points (LPs), so that their execution announces the corresponding atomic 
set operation. In Fig. 2, the LP of a successful add operation is at line 15 of the 
add method (denoted by a blue dot) when the CAS succeeds, whereas the LP of 
an unsuccessful add operation is at line 13 of the find method (denoted by a 
red dot). We must now verify that in any concurrent execution of a collection 
of method calls, the sequence of announced operations satisfies the semantics of 
the set data structure. This check is performed by an observer, which monitors 
the sequence of announced operations. The observer for the set data structure 
utilizes a register, which is initialized with a single, arbitrary key value. It checks 
that operations on this particular value follow set semantics, i.e., that successful 
add and remove operations on an element alternate and that contains are con- 
sistent with them. We form the cross-product of the program and the observer, 
synchronizing on operation announcements. This reduces the problem of check- 
ing linearizability to the problem of checking that in this cross-product, regard- 
less of the initial observer register value, the observer cannot reach a state where 
the semantics of the set data structure has been violated. 

To verify that the observer cannot reach a state where a violation is reported, 
we compute a symbolic representation of an invariant that is satisfied by all 
reachable configurations of the cross-product of a program and an observer. This 
symbolic representation combines thread abstraction, data abstraction and our 
novel fragment abstraction to represent the heap state. Our thread abstraction 
adapts the thread-modular approach by representing only the view of single, 
but arbitrary, thread th. Such a view consists of the local state of thread th, 
including the value of the program counter, the state of the observer, and the 
part of the heap that is accessible to thread th via pointer variables (local to th 
or global). Our data abstraction represents variables and cell fields that range 
over small finite domains by their concrete values, whereas variables and fields 
that range over the same domain as key fields are abstracted to constraints over 
their relative ordering (wrp. to <). 

In our fragment abstraction, we represent the part of the heap that is acces- 
sible to thread th by a set of fragments. A fragment represents a pair of 
heap cells (accessible to th) that are connected by a pointer field, under the 
applied data abstraction. A fragment is a triple of form (i,o,¢), where i and 
o are tags that represent the two cells, and ¢ is a subset of {<,=,>} which 
constrains the order between the key fields of the cells. Each tag is a tuple 
tag = (dabs, pvars, reachfrom, reachto, private), where 


— dabs represents the non-pointer fields of the cell under the applied data 
abstraction, 

— pvars is the set of (local to th or global) pointer variables that point to the 
cell, 

— reachfrom is the set of (i) global pointer variables from which the cell rep- 
resented by the tag is reachable via a (possibly empty) sequence of next[1] 
pointers, and (ii) observer registers x; such that the cell is reachable from 
some cell whose data value equals that of x,, 
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— reachto is the corresponding information, but now considering cells that are 


reachable from the cell represented by the tag. 
— private is true only if c is private to th. 


Thus, the fragment contains both (i) local information about the cell’s fields and 
variables that point to it, as well as (ii) global information, representing how 
each cell in the pair can reach to and be reached from (by following a chain of 


pointers) a small set of globally significant heap cells. 
A set of fragments represents the set of heap 
structures in which each pair of pointer-connected 
nodes is represented by some fragment in the set. 
Put differently, a set of fragments describes the set of 
heaps that can be formed by “piecing together” pairs 
of pointer-connected nodes that are represented by 
some fragment in the set. This “piecing together” 
must be both locally consistent (appending only 


marked[h] 


marked[1] 


key 


next[h] 


height = h 


next[1] 


fragments that agree on their common node), and Fig. 3. A structure of a cell 


globally consistent (respecting the global reachabil- 


ity information). When applying fragment abstraction to skiplists, we use two 
types of fragments: level 1-fragments for nodes connected by a next[1]-pointer, 
and higher level-fragments for nodes connected by a higher level pointer. In other 
words, we abstract all levels higher than 2 by the abstract element higher. 
Thus, a pointer or non-pointer variable of form v[k], indexed by a level k > 2, is 


abstracted to v[higher]. 


preds(1)[2] 9 T 
preds(2)[3] currs(1)[2] 
H, preds(1)[3] currs(2)[3] 
x X| e x| s 
currs(1)[2] 
pred(1) 
all ata i preds(D[1] aa © >X | © 
x| H— x| + zX | © zJ x| e s|x| © > 
mie 5 7 10 


Fig. 4. A heap shape of a 3-level skiplist with two threads active 


Let us illustrate how fragment abstraction applies to the skiplist algorithm. 
Figure 4 shows an example heap state of the skiplist algorithm with three levels. 
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Each heap cell is shown with the values of its fields as described in Fig. 3. In 
addition, each cell is labeled by the pointer variables that point to it; we use 
preds(i)[k] to denote the local variable preds[k] of thread th;, and the same 
for other local variables. In the heap state of Fig. 4, thread th, is trying to add 
a new node of height 1 with key 9, and has reached line 8 of the add method. 
Thread thg is trying to add a new node with key 20 and it has done its first 
iteration of the for loop in the find method. The variables preds(2)[3] and 
currs(2)[3] have been assigned so that the new node (which has not yet been 
created) will be inserted between node 5 and the tail node. The observer is not 
shown, but the value of the observer register is 9; thus it currently tracks the 
add operation of thy. 

Figure 5 illustrates how pairs of heap nodes can be represented by fragments. 
As a first example, in the view of thread th,, the two left-most cells in Fig. 4 are 
represented by the level 1-fragment vı in Fig. 5. Here, the variable preds(1)[3] is 
represented by preds[higher]. The mapping 7 represents the data abstraction 
of the key field, here saying that it is smaller than the value 9 of the observer 
register. The two left-most cells are also represented by a higher-level fragment, 
viz. vg. The pair consisting of the two sentinel cells (with keys —oo and +00) is 
represented by the higher-level fragment v9. In each fragment, the abstraction 
dabs of non-pointer fields are shown represented inside each tag of the fragment. 
The ¢ is shown as a label on the arrow between two tags. Above each tag is pvars. 
The first row under each tag is reachfrom, whereas the second row is reachto. 

Figure 5 shows a set of fragments that is sufficient to represent the part of 
the heap that is accessible to th, in the configuration in Fig.4. There are 11 
fragments, named vi, ..., V11. Two of these (vg, v7 and v11) consist of a tag 
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Fig. 5. Fragment abstraction of skiplist algorithm 
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that points to L. All other fragments consist of a pair of pointer-connected tags. 
The fragments v1, ..., ve are level-1-fragments, whereas v7, ..., vi; are higher 
level-fragments. The private field of the input tag of v7 is true, whereas the 
private field of tags of other fragments are false. 

To verify linearizability of the algorithm in Fig. 2, we must represent several 
key invariants of the heap. These include (among others): 


1. the bottom-level list is strictly sorted in key order, 

2. a higher-level pointer from a globally reachable node is a shortcut into the 

level-1 list, i.e., it points to a node that is reachable by a sequence of next[1] 

pointers, 

all nodes which are unreachable from the head of the list are marked, and 

4. the variable pred points to a cell whose key field is never larger than the 
input parameter of its add method. 


a 


Let us illustrate how such invariants are captured by our fragment abstraction. 
(1) All level-1 fragments are strictly sorted, implying that the bottom-level list 
is strictly sorted. (2) For each higher-level fragment v, if H € v.i-reachfrom 
then also H € v.o.reachfrom, implying (together with v.p = {<}) that the cell 
represented by v.o it is reachable from that represented by v.i by a sequence 
of next([1]-pointers. (3) This is verified by inspecting each tag: v3 contains the 
only unreachable tag, and it is also marked. (4) The fragments express this 
property in the case where the value of key is the same as the value of the 
observer register x. Since the invariant holds for any value of x, this property is 
sufficiently represented for purposes of verification. 


3 Concurrent Data Structure Implementations 


In this section, we introduce our representation of concurrent data structure 
implementations, we define the correctness criterion of linearizability, we intro- 
duce observers and how to use them for specifying linearizability. 


3.1 Concurrent Data Structure Implementations 


We first introduce (sequential) data structures. A data structure DS is a pair 
(ID, M), where D is a (possibly infinite) data domain and M is an alphabet of 
method names. An operation op is of the form m(d’",d°“’), where m € M is a 
method name, and d’",d° are the input resp. output values, each of which is 
either in D or in some small finite domain F, which includes the booleans. For 
some method names, the input or output value is absent from the operation. A 
trace of DS is a sequence of operations. The (sequential) semantics of a data struc- 
ture DS is given by a set [DS] of allowed traces. For example, a Set data structure 
has method names add, remove, and contains. An example of an allowed trace 
is add(3, true) contains(4,false) contains(3, true) remove(3, true). 

A concurrent data structure implementation operates on a shared state con- 
sisting of shared global variables and a shared heap. It assigns, to each method 
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name, a method which performs operations on the shared state. It also comes 
with a method named init, which initializes its shared state. 

A heap (state) H consists of a finite set C of cells, including the two special 
cells null and L (dangling). Heap cells have a fixed set F of fields, namely 
non-pointer fields that assume values in D or F, and possibly lock fields. We use 
the term D-field for a non-pointer field that assumes values in D, and the terms 
F-field and lock field with analogous meaning. Furthermore, each cell has one 
or several named pointer fields. For instance, in data structure implementations 
based on singly-linked lists, each heap cell has a pointer field named next; in 
implementations based on skiplists there is an array of pointer fields named 
next|k] where k ranges from 1 to a maximum level. 

Each method declares local variables and a method body. The set of local 
variables includes the input parameter of the method and the program counter 
pc. A local state loc of a thread th defines the values of its local variables. The 
global variables can be accessed by all threads, whereas local variables can be 
accessed only by the thread which is invoking the corresponding method. Vari- 
ables are either pointer variables (to heap cells), locks, or data variables assuming 
values in D or F. We assume that all global variables are pointer variables. The 
body is built in the standard way from atomic commands, using standard control 
flow constructs (sequential composition, selection, and loop constructs). Atomic 
commands include assignments between variables, or fields of cells pointed to 
by a pointer variable. Method execution is terminated by executing a return 
command, which may return a value. The command new Node() allocates a new 
structure of type Node on the heap, and returns a reference to it. The compare- 
and-swap command CAS(a,b,c) atomically compares the values of a and b. If 
equal, it assigns the value of c to a and returns true, otherwise, it leaves a 
unchanged and returns false. We assume a memory management mechanism, 
which automatically collects garbage, and ensures that a new cell is fresh, i.e., 
has not been used before; this avoids the so-called ABA problem (e.g., [31]). 

We define a program P (over a concurrent data structure) to consist of an 
arbitrary number of concurrently executing threads, each of which executes a 
method that performs an operation on the data structure. The shared state is 
initialized by the init method prior to the start of program execution. A config- 
uration of a program P is a tuple cp = (T, LOC, H} where T is a set of threads, H 
is a heap, and LOC maps each thread th € T to its local state LOC (th). We assume 
concurrent execution according to sequentially consistent memory model. The 
behavior of a thread th executing a method can be formalized as a transition 
relation —>+n on pairs (loc, H) consisting of a local state loc and a heap state 
H. The behavior of a program P can be formalized by a transition relation —>p 
on program configurations; each step corresponds to a move of a single thread. 
Le., there is a transition of form (T, LOC, H) >p (T, LOC[th — loc’), H’) whenever 
some thread th € T has a transition (loc, H} —>+n (loc’, H’) with LOC(th) = loc. 
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3.2 Linearizability 


In a concurrent data structure implementation, we represent the calling of a 
method by a call action call, m (d’”), and the return of a method by a return 
action ret, m(d*), where o € N is an action identifier, which links the call 
and return of each method invocation. A history h is a sequence of actions such 
that (i) different occurrences of return actions have different action identifiers, 
and (ii) for each return action a2 in h there is a unique matching call action a, 
with the same action identifier and method name, which occurs before ag in h. A 
call action which does not match any return action in h is said to be pending. A 
history without pending call actions is said to be complete. A completed extension 
of h is a complete history h’ obtained from h by appending (at the end) zero or 
more return actions that are matched by pending call actions in h, and thereafter 
removing the call actions that are still pending. For action identifiers 01,02, we 
write 0; Xp 02 to denote that the return action with identifier o} occurs before 
the call action with identifier op in h. A complete history is sequential if it 
is of the form a,a,aga4---a,a), where a; is the matching action of a; for all 
i:1<i<n,ie., each call action is immediately followed by its matching return 
action. We identify a sequential history of the above form with the corresponding 
trace op,op,:-- op, where op; = m(di",d?™"), a; = callo, m(di"), and a; = 
ret,, m(d?“"), i.e., we merge each call action together with the matching return 
action into one operation. A complete history h’ is a linearization of h if (i) h’ is 
a permutation of h, (ii) R’ is sequential, and (iii) o1 <w 02 if o1 <n o2 for each 
pair of action identifiers 0; and og. A sequential history h’ is valid wrt. DS if the 
corresponding trace is in [DS]. We say that h is linearizable wrt. DS if there is 
a completed extension of h, which has a linearization that is valid wrt. DS. We 
say that a program P is linearizable wrt. DS if, in each possible execution, the 
sequence of call and return actions is linearizable wrt. DS. 

We specify linearizability using the technique of observers [1,3,7,9]. Depend- 
ing on the data structure, we apply it in two different ways. 


— For implementations of sets and priority queues, the user instruments each 
method so that it announces a corresponding operation precisely when the 
method executes its LP, either directly or with lightweight instrumentation 
using the technique of linearization policies [3]. We represent such announce- 


ments by labels on the program transition relation >p, resulting in transi- 
m(d'” de") 


tions of form cp pCp. Thereafter, an observer is constructed, which 
monitors the sequence of operations that is announced by the instrumen- 
tation; it reports (by moving to an accepting error location) whenever this 
sequence violates the (sequential) semantics of the data structure. 

— For stacks and queues, we use a recent result [7,9] that the set of linearizable 
histories, i.e., sequences of call and return actions, can be exactly specified by 
an observer. Thus, linearizability can be specified without any user-supplied 
instrumentation, by using an observer which monitors the sequences of call 
and return actions and reports violations of linearizability. 
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Fig. 6. Set observer. 


Formally, an observer O is a tuple cage Seer Gap AF eee) where S° is a 
finite set of observer locations including the initial location s°.,, and the accepting 
location s©.,, a finite set X? of registers, and A® is a finite set of transitions. 
For observers that monitor sequences of operations, transitions are of the form 
(s;,m(a", 2°“), s2), where m € M is a method name and x” and 2°“ are either 
registers or constants, i.e., transitions are labeled by operations whose input 
or output data may be parameterized on registers. The observer processes a 
sequence of operations one operation at a time. If there is a transition, whose 
label (after replacing registers by their values) matches the operation, such a 
transition is performed. If there is no such transition, the observer remains in its 
current location. The observer accepts a sequence if it can be processed in such a 
way that an accepting location is reached. The observer is defined in such a way 
that it accepts precisely those sequences that are not in [DS]. Figure6 depicts 
an observer for the set data structure. 

To check that no execution of the program announces a sequence of labels 
that can drive the observer to an accepting location, we form the cross-product 
S = P & O of the program P and the observer O, synchronizing on common 
transition labels. Thus, configurations of S are of the form (cp, (s, p)), consist- 
ing of a program configuration cp, an observer location s, and an assignment 
p of values in D to the observer registers. Transitions of S are of the form 


(cp, (s, P)),—s, (cp’, (s’, p)), obtained from a transition cp eee! of the pro- 
gram with some (possibly empty) label A, where the observer makes a transition 


ss! ifit can perform such a matching transition, otherwise s’ = s. Note that the 
observer registers are not changed. We also add straightforward instrumentation 
to check that each method invocation announces exactly one operation, whose 
input and output values agree with the method’s parameters and return value. 
This reduces the problem of checking linearizability to the problem of checking 
that in this cross-product, the observer cannot reach an accepting error location. 


4 Verification Using Fragment Abstraction for Skiplists 


In the previous section, we reduced the problem of verifying linearizability 
to the problem of verifying that, in any execution of the cross-product of a 
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program and an observer, the observer cannot reach an accepting location. We 
perform this verification by computing a symbolic representation of an invariant 
that is satisfied by all reachable configurations of the cross-product, using an 
abstract interpretation-based fixpoint procedure, starting from a symbolic rep- 
resentation of the set of initial configurations, thereafter repeatedly performing 
symbolic postcondition computations that extend the symbolic representation 
by the effect of any execution step of the program, until convergence. 

In Sect. 4.1, we define in more detail our symbolic representation for skiplists, 
focusing in particular on the use of fragment abstraction, and thereafter (in 
Sect. 4.2) describe the symbolic postcondition computation. Since singly-linked 
lists is a trivial special case of skiplists, we can use the relevant part of this 
technique also for programs based on singly-linked lists. 


4.1 Symbolic Representation 


This subsection contains a more detailed description of our symbolic represen- 
tation for programs that operate on skiplists, which was introduced in Sect. 2. 
We first describe the data abstraction, thereafter the fragment abstraction, and 
finally their combination into a symbolic representation. 


Data Abstraction. Our data abstraction is defined by assigning a abstract 
domain to each concrete domain of data values, as follows. 


— For small concrete domains (including that of the program counter, and of 
the observer location), the abstract domain is the same as the concrete one. 

— For locks, the abstract domain is {me, other, free}, meaning that the lock is held 
by the concerned thread, held by some other thread, or is free, respectively. 

— For the concrete domain D of data values, the abstract domain is the set 
of mappings from observer registers and local variables ranging over D to 
subsets of {<,=, >}. An mapping in this abstract domain represents the set 
of data values d such that it maps each local variable and observer register 
with a value d’ € D to a set which includes a relation ~ such that d ~ d’. 


Fragment Abstraction. Let us now define our fragment abstraction for 
skiplists. For presentation purposes, we assume that each heap cell has at most 
one D-field, named data. For an observer register x;, let a x;-cell be a heap cell 
whose data field has the same value as x;. 

Since the number of levels is unbounded, we define an abstraction for levels. 
Let k be a level. Define the abstraction of a pointer variable of form p[k], denoted 


m 


p[k], to be p[1] if k = 1, and to be p[higher] if k > 2. That is, this abstraction 
does not distinguish different higher levels. 

A tag is a tuple tag = (dabs, pvars, reachfrom, reachto, private), where 
(i) dabs is a mapping from non-pointer fields to their corresponding abstract 
domains; if a non-pointer field is an array indexed by levels, then the abstract 
domain is that for single elements: e.g., the abstract domain for the array marked 
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in Fig. 2 is simply the set of booleans, (ii) pvars is a set of abstracted pointer 
variables, (iii) reachfrom and reachto are sets of global pointer variables and 
observer registers, and (iv) private is a boolean value. 

For a heap cell c that is accessible to thread th in a configuration cs, and a 
tag tag = (dabs, pvars, reachfrom, reachto, private), we let c< „tag denote 


that c satisfies the tag tag “at level k”. More precisely, this means that 


— dabs is an abstraction of the concrete values of the non-pointer fields of c; 
for array fields f we use the concrete value f[k], 

— pvars is the set of abstractions of pointer variables (global or local to th) 
that point to c, 

— reachfron is the set of (i) abstractions of global pointer variables from which 
c is reachable via a (possibly empty) sequence of next[1] pointers, and (ii) 
observer registers x; such that c is reachable from some x;-cell (via a sequence 
of next[1] pointers), 

— reachto is the set of (i) abstractions of global pointer variables pointing to 
a cell that is reachable (via a sequence of next[1] pointers) from c, and (ii) 
observer registers x; such that some x;-cell is reachable from c. 

— private is true only if c is not accessible to any other thread than th. 


Note that the global information represented by the fields reachfrom and 
reachto concerns only reachability via level-1 pointers. 

A skiplist fragment v (or just fragment) is a triple of form (i, 0, ¢), of form 
(i, null), or of form (i, L), where i and o are tags and ¢ is a subset of {<,=, >}. 
Each skiplist fragment additionally has a type, which is either level-1 or higher- 
level (note that a level-1 fragment can otherwise be identical to a higher-level 
fragment). For a cell c which is accessible to thread th, and a fragment v of 
form (i,0,@), let c <i, v denote that the next([k] field of c points to a cell c’ 
such that c <{g, i, and c’ <ff x 0, and c.data ~ c’.data for some ~E ¢. The 
definition of € <i; V is adapted to fragments of form (i, nu11) and (i, |) in the 
obvious way. For a fragment v = (i, o, ¢), we often use v.i for i and v.o for o, 
etc. 

Let V be a set of fragments. A global configuration cg satisfies V wrp. to th, 
denoted cs Eve?” V, if 


— for any cell c that is accessible to th (different from null and L), there is a 
level-1 fragment v € V such that c <j, V, and 

— for all levels k from 2 up to the height of c, there is a higher-level fragment 
v € V such that c <fi x V- 


Intuitively, a set of fragment represents the set of heap states, in which each pair 
of cells connected by a next|1] pointer is represented by a level-1 fragment, and 
each pair of cells connected by a next/k] pointer for k > 2 is represented by a 
higher-level fragment which represents array fields of cells at index k. 
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Symbolic Representation. We can now define our abstract symbolic represen- 
tation. 

Define a local symbolic configuration o to be a mapping from local non- 
pointer variables (including the program counter) to their corresponding abstract 
domains. We let cs /'8° o denote that in the global configuration cs, the local 
configuration of thread th satisfies the local symbolic configuration o, defined 
in the natural way. For a local symbolic configuration g, an observer location s, 
a pair V of fragments and a thread th, we write cs Fin (0, s, VY} to denote that 


(i) cs K 18° ø, (ii) the observer is in location s, and (iii) cs ERC” V. 


Definition 1. A symbolic representation W is a partial mapping from pairs of 
local symbolic configurations and observer locations to sets of fragments. A sys- 
tem configuration cs satisfies a symbolic representation W, denoted cs sat W, 
if for each thread th, the domain of WV contains a pair (o,s) such that cs Hrn 


(a, 8,W((o,s))). 


4.2 Symbolic Postcondition Computation 


The symbolic postcondition computation must ensure that the symbolic repre- 
sentation of the reachable configurations of a program is closed under execu- 
tion of a statement by some thread. That is, given a symbolic representation 
W, the symbolic postcondition operation must produce an extension W of Y, 
such that whenever cs sat W and cs—scg then cs’ sat W’. Let th be an arbi- 
trary thread. Then cs sat W means that Dom(W) contains some pair (a, s} with 
cs Ftn (0, 8,((o,s8))). The symbolic postcondition computation must ensure 
that Dom(W’) contains a pair (o’,s’) such that cg Fen (0, 8’,W’((o’,s’))). In 
the thread-modular approach, there are two cases to consider, depending on 
which thread causes the step from cs to cg’. 


— Local Steps: The step is caused by th itself executing a statement which may 
change its local state, the location of the observer, and the state of the heap. 
In this case, we first compute a local symbolic configuration o’, an observer 
location s’, and a set V” of fragments such that cg Fin (o’, 5’, V"), and then 
(if necessary) extend W so that (o’, s’) € Dom(W) and V’ C W((o'", s’)). 

— Interference Steps: The step is caused by another thread tha, executing a 
statement which may change the location of the observer (to s’) and the heap. 
By cs sat W there is a local symbolic configuration a2 with (o2, s) E€ Dom(W) 
such that cs Fin, (02, 8, Y ((02, s))). For any such og and statement of thz, we 
must compute a set V’ of fragments such that the resulting configuration cs’ 
satisfies cl; $°? V’ and ensure that (o, s’) € Dom() and V’ C W((o,s')). 
To do this, we first combine the local symbolic configurations o and og and 
the sets of fragments W((o, s)) and W((o2, s)), using an operation called inter- 
section, into a joint local symbolic configuration of th and thz and a set Vi,2 
of fragments that represents the cells accessible to either th or thy. We there- 
after symbolically compute the postcondition of the statement executed by 
thg, in the same was as for local steps, and finally project the set of resulting 
fragments back onto th to obtain V’. 
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In the following, we first describe the symbolic postcondition computation for 
local steps, and thereafter the intersection operation. 


Symbolic Postcondition Computation for Local Steps. Let th be an 
arbitrary thread, assume that (o,s) € Dom(W), and let V = W((o,s)) For 
each statement that th can execute in a configuration cs with cs Ein (0, s, V}, 
we must compute a local symbolic configuration o’, a new observer location 
s’ and a set V’ of fragments such that the resulting configuration cs’ satisfies 
cs Hen (o’, 8’, VV’). This computation is done differently for each statement. For 
statements that do not affect the heap or pointer variables, this computation is 
standard, and affects only the local symbolic configuration, the observer location, 
and the dabs component of tags. We therefore here describe how to compute 
the effect of statements that update pointer variables or pointer fields of heap 
cells, since these are the most interesting cases. In this computation, the set V’ 
is constructed in two steps: (1) First, the level-1 fragments of V’ are computed, 
based on the level-1 fragments in V. (2) Thereafter, the higher-level fragments of 
V’ are computed, based on the higher-level fragments in V and how fragments 
in V are transformed when entered in to V’. We first describe the construction 
of level-1 fragments, and thereafter the construction of higher-level fragments. 


Construction of Level-1 Fragments. Let us first intuitively introduce tech- 
niques used for constructing the level-1 fragments of V’. Consider a statement 
of form g := p, which assigns the value of a local pointer variable p to a global 
pointer variable g. The set V’ of fragments is obtained by modifying fragments in 
V to reflect the effect of the assignment. For any tag in a fragment, the dabs field 
is not affected. The pvars field is updated to contain the variable g if and only 
if it contained the variable p before the statement. The difficulty is to update 
the reachability information represented by the fields reachfrom and reachto, 
and in particular to determine whether g should be in such a set after the state- 
ment (note that if p were a global variable, then the corresponding reachability 
information for p would be in the fields reachfrom and reachto, and the update 
would be simple, reflecting that g and p become aliases). In order to construct 
V’ with sufficient precision, we therefore investigate whether the set of fragments 
V allows to form a heap in which a p-cell can reach or be reached from (by a 
sequence of next|[1] pointers) a particular tag of a fragment. We also investigate 
whether a heap can be formed in which a p-cell can not reach or be reached from 
a particular tag. For each such successful investigation, the set V’ will contain 
a level-1 fragment with corresponding contents of its reachto and reachfrom 
fields. 

The postcondition computation performs this investigation by computing a 
set of transitive closure-like relations between level-1 fragments, which represent 
reachability via sequences of next[1] pointers (since only these are relevant for 
the reachfrom and reachto fields). First, say that two tags tag and tag’ are 
consistent (wrp. to a set of fragments V) if the concretizations of their dabs- 
fields overlap, and if the other fields pvars, reachfrom, reachto, and private) 
agree. Thus, tag and tag’ are consistent if there can exist a cell c accessible to 
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th in some heap, with c <{¢ tag and c <6 tag’. Next, for two level-1 fragments 
vı and vz in a set V of fragments, 


— let vı vy ve denote that v;.o and vg.i are consistent, and 

— let vı +y ve denote that vj.o = v2.0 are consistent, and that either 
vı.i.pvars N vg.i.pvars = Ú or the global variables in v|.i.reachfrom are 
disjoint from those in vg.i.reachfrom. 


Intuitively, vi vy v2 denotes that it is possible that c,.next[1] = c2 for some 
cells with cy ea vı and ce <a v2. Intuitively, vı vy ve denotes that it is 
possible that c;.next[1] = co.next[1] for different cells cı and c2 with cı <1 V1 
and c2 <% 1 V2 (Note that these definitions also work for fragments containing 
null or L). We use these relations to define the following derived relations on 
level-1 fragments: 


+ BE * : si 
— y denotes the transitive closure, and —y the reflexive transitive closure, 
of Vv; 
xk = 5 * 
— vı Sy ve denotes that Jv, vh € V with vi oy v} where vı >y v and 


* , 
V2 =V V9, 


x+ = : = 
— vı Sy v2 denotes that 3v, vh € V with vi oy v5 where vı >y vj and 


+ 1 
V2 =y V3, 


*O = : g 
— vı y v2 denotes that dv) € V with vi evy vo where vı >v vi, 


++ 4 a 
— vı Gy v2 denotes that dvi,v, € V with vi ey vh where vı >y vj and 


+ / 
V2 =y V3, 


+0 = : + 
— vi Gy və denotes that dvi € V with vi oy v2 where vi >y v}. 


+ *+ 
We sometimes use, e.g., V2 ey vı for vı e va. We say that vı and vo are 


compatible if vz, Š Vy, OF Vy a Vz, OF Vz pi vy. Intuitively, if vı and v2 are 
satisfied by two cells in the same heap state, then they must be compatible. 


O20 O40 OHO OHO ©450 
OHO OFO OHO OLO OZO 


Fig. 7. Illustration of some transitive closure-like relations between fragments 
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Figure 7 illustrates the above relations for a heap state with 13 heap cells. 
The figure depicts, in green, four pairs of heap cells connected by a next[1] 
pointer, which satisfy the four fragments v1, v2, v3, and v4, respectively. At the 
bottom are depicted the transitive-closure like relations that hold between these 
fragments. 

We can now describe the symbolic postcondition computation for statements 
that affect pointer variables or fields. This is a case analysis, and for space reasons 
we only include some representative cases. 

First, consider a statement of form x := y, where x and y are local (to thread 
th) or global pointer variables. We must compute a set V’ of fragments which 
are satisfied by the configuration after the statement. We first compute the level- 
1-fragments in V” as follows (higher-level fragments will be computed later). We 
observe that for any cell c which is accessible to th after the statement, there 
must be some level-1 fragment v’ in V’ with c <{ġ , v’. By assumption, c satisfies 
some fragment v in V before the statement, and is in the same heap state as the 
cell pointed to by y. This implies that v must be compatible with some fragment 
vy € V such that F € vy.i.pvars (recall that F is the abstraction of y, which in 
the case that y is an array element maps higher level indices to that abstract 
index higher). This means that we can make a case analysis on the possible 
relationships between v and any such v,. Thus, for each fragment v, € V such 
that F € vy.i.pvars we let V’ contain the fragments obtained by any of the 
following transformations on any fragment in V. 


/ 


y» Which is the same as 


1. First, for the fragment v, itself, we let V’ contain v 
vy, except that 
- v,.i.pvars = v,.i-pvars U {x} and v}, .o.pvars = v.o.pvars \ {x} 
and furthermore, if x is a global variable, then 
— vi.i.reachto = vy.i-.reachtoU{} and v} .i.reachfrom = vy.i.reachfromU {x}, 
/-0-reachfrom = vy.o.reachfromU {x} and vi,.o.reachto = vy.o.reachto \ {x}. 
2. for each v with v Gy vy, let V’ contain v’ which is the same as v except that 
— v'.i.pvars = v.i.pvars \ {x}, 
.o.pvars = v.o.pvars U {x}, 
.i.reachfrom = v.i.reachfrom \ {x} if x is a global variable, 
.i.reachto = v.i.reachto U {x} if x is a global variable, 
.o.reachfrom = v.o.reachfrom U {x} if x is a global variable, 
/ 


— v’.o.reachto = v.o.reachto U {x} if x is a global variable, 


— Vv 
v’ 
v’ 
y’ 
v’ 


3. We perform analogous inclusions for fragments v with v By Vy, Vy Sy 
V, Vy y v, and vy “Sy v. Here, we show only the case of Vy a v, in 
which case we let V’ contain v’ which is the same as v except that X is 
removed from the sets v’.i.pvars, v’.o.pvars, v’.i.reachfrom, v’.i.reachto, 
v’.o.reachfrom, and v’.o.reachto. 


The statement x := y.next[1] is handled rather similarly to the case x := y. Let 
us therefore describe the postcondition computation for statements of the form 
x.next[1] := y. This is the most difficult statement, since it is a destructive 
update of the heap. It affects reachability relations for both x and y. The post- 
condition computation makes a case analysis on how a fragment in V is related 
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to some pair of compatible fragments vz, vy in V such that £ € v,.i.pvars, 
F € v,.i.pvars. Thus, for each pair of compatible fragments vg, Vy in V such 
that È € v,.i-pvars and F € v,.i-pvars, it is first checked whether the statement 


may form a cycle in the heap. This may happen if vy oy vz, in which case the 
postcondition computation reports a potential cycle. Otherwise, V’ consists of 


1. the fragment Vnew, representing the new pair of neighbours formed by the 
statement, of form Vnew = (i,0,¢), such that Vnew-i.tag = vz-i.tag and 
Vnew-0.tag = vy.i.tag except that vnew.o.reachfrom = vy.i.reachfrom U 
v,.i.reachfrom and Vney.i.reachto = vy.i.reachto U v,.i.pvars; the con- 
straint represent by Vnew-Q is obtained from the constraints represented by the 
data abstractions of v,.i and v,.i, as well as the possible transitive closure- 
relations between v, and vy, some of which imply that the data fields of v, 
and vy are ordered, and 

2. all possible fragments that can result from a transformation of some fragment 
v € V. This is done by an exhaustive case analysis on the possible relation- 
ships between v, vz and v,. Let us consider an interesting case, in which 


Ve ae v and either v ys Vy OF Vy & v. In this case, 

— for each subset regset of the observer registers in v.i.reachfrom N 
v,.i.reachfrom, and for each subset regset’ of the set of observer regis- 
ters in v.o.reachfromfM v,.i.reachfrom, we let V’ contain a fragment v’ 
which is the same as v except that v’.i.reachfrom = (v.i-.reachfrom 
\v,.i.reachfrom) Uregset and v’.o.reachfrom = (v.o.reachfrom \ 
V,-i.reachfrom) U regset’. An intuitive explanation for the rule for 
v’.i.reachfrom is that the global variables that can reach vz.i should 
clearly be removed from v’.i.reachfrom since vy “y vw’ is false after the 
statement. However, for an observer register x;, an x;-cell can still reach 
v’.i, if there are two x;-cells, one which reaches vz.i and another which 
reaches v’.i; we cannot precisely determine for which x; this may be the 
case, except that any such x; must be in v.i.reachfromMv,.i.reachfrom. 
The intuition for the rule for v’.o.reachfrom is analogous. 


Construction of Higher-Level Fragments. Based on the above construction 
of level-1 fragments, the set of higher-level fragments in V’ is obtained as fol- 
lows. For each higher level-fragment v € V, let vı and və be level 1-fragments 
such that vj.i.tag = v.i.tag and v2.i.tag = v.o.tag. For any fragments vj 
and v5 that are derived from vı and vo, respectively, V” contains a higher-level 
fragment v’ which is the same as v except that (i) v’.i.pvars = vj.i.pvars 
and v’.o.pvars = vy.i.pvars, (ii) v’/.i.reachfrom = v}.i.reachfrom and 
v’.o.reachfrom = v}.i.reachfrom, and (iii) v’.i.reachto = v\.i.reachto and 
v’.o.reachto = vs.i.reachto. In addition, a statement of form x.next|k] := y 
for k > 2 creates a new fragment. The formation of this fragment is simpler 
than for the statement x.next|[1] := y, since reachability via next/1]-pointers is 
preserved. 


462 P. A. Abdulla et al. 


Symbolic Postcondition Computation for Interference Steps. Here, the 


key step is the intersection operation, which takes two sets of fragments V; and 


V2, and produces a set of joint fragments Vj 2, such that cs ae Vı,2 for any 


configuration such that cs a V; for i = 1,2 (here = ye is defined in the 


natural way). This means that for each heap cell accessible to either th; or tho, 
the set Vi,2 contains a fragment v with C< ih, tha},k Y for each k which is at most 
the height of c (generalizing the notation <%f x to several threads). Note that a 
joint fragment represents local pointer variables of both th, and thz. In order to 
distinguish between local variables of th, and thg, we use x[i] to denote a local 
variable x of thread th;. Here, we describe the intersection operation for level-1 
fragments. The intersection operation is analogous for higher-level fragments. 

For a fragment v, define v.i.greachfrom as the set of global vari- 
ables in v.i.reachfrom. Define v.i.greachto, v.o.greachfrom, v.o.greachto, 
v.i.gpvars, and v.o.gpvars analogously. Define v.i.gtag as the tuple 
(v.i.dabs, v.i.gpvars, v.i.greachfrom, v.i.greachto), and define v.o.gtag anal- 
ogously. We must distinguish the following possibilities. 


— If c is accessible to both th; and the, then there are fragments vı € Vi 
and v2 € Vz such that c dy, vi and c <f, ı ve. This can happen only 
if vy.i.gtag = vo.i.gtag, and vj.o.gtag = vo.o.gtag, and vj.i.private = 
vg.i.private = false. Thus, for any such pair of fragments vı € Vı and 
v2 € V2, we let Vi 2 contain a fragment vij2 which is identical to vı except 
that 

è vi2.1.pvars = vj.i.pvars U v2.i.pvars, 

e V12.0.pvars = v).0.pvars U v2.0.pvars, 

e Vig.i.reachfrom = v,.i.reachfrom U vg.i.reachfrom, and 
è vj2.0.reachfrom = v,.0.reachfrom U vg.0o.reachfrom. 

— If c is accessible to thy, but not to th, and c.next[1] is accessible also to 
thz, then there are fragments vı € Vj and v2 € V2 such that c <Q) vi 
and c.next[1] <ih, 1 v2-0. This can happen only if v1.i.greachfrom = 9), and 
v1.0.gtag = vo.0.gtag, and v,.o.private = v2.0.private = false. Thus, 
for any such pair of fragments vı € Vı and vg € V2, we let Vi 2 contain a 
fragment v| which is identical to vı except that 

e v|.o.pvars = v1.0.pvars U vo.o.pvars, and 
e vj, .o.reachfrom = v,.0.reachfrom U vg.0o.reachfrom. 

— If neither c nor c.next[1] is accessible the, then there is a fragment vı € Vi 
such that c <1 vı. This can happen only if v;.o.greachfrom = f, in which 
case we let Vi 2 contain the fragment vı. 

— For each of the two last cases, there is also a symmetric case with the roles 
of th; and thg reversed. 


5 Arrays of Singly-Linked Lists with Timestamps 


In this section, we show how to apply fragment abstraction to concurrent pro- 
grams that operate on a shared heap which represents an array of singly linked 
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lists. We use this abstraction to provide the first automated verification of lin- 
earizability for the Timedstamped stack and Timestamped queue algorithms 
of [12] as reported in Sect. 6. 


struct Node { int pop (): 
int. data; 1 boolean success = false; 
Timestamp ts; 2 ine marts = iy 
3 Node» youngest, myTop, n = null; 
Node* next; 
4 while (!success) 
boolean mark; 
} 5 int k; 
6 for(int i=1; i<=maxThreads; i++) 
oh n = pools[il] 
init () 8 while (n.mark & n.next != n) n = n.next; 
Nodex pools [maxThreads]; 9 if (maxTS < n.ts 
for(int i=l; i<=maxThreads; i++) 10 maxTS = n.ts; 
pools[i].next = null; 11 youngest = n; 
12 k = i; myTop = pools[k]; 
void push(int d): 13 if (youngest != null) 
1 Node» new := new Node(d,-1,null, false); 14 success = CAS (youngest .mark, false, true) ; 
2 new.next = pools[myID]; ARS if (success 
3 pools[myID] = new; 16 CAS (pools[k], myTop, youngest); 
4 Timestamp t = new Timestamp (); 17 if (myTop != youngest); 
5 newts = t; 18 myTop.next = youngest; 
6 Node» next = new.next; 19 pools[k].next = youngest .next; 
7 while (next.next != next & !next.mark) 20 Node» next=youngest.next 
8 next = next.next; 21 while (next.next != next & next.mark) 
9 new.next = next; 22 next = next.next; 
10 return new; 23 youngest.next = next; 


24 return youngest.data; 


Fig. 8. Description of the Timestamped stack algorithm, with some simplifications. 


Figure8 shows a simplified version of the Timestamped Stack (TS stack) 
of [12], where we have omitted the check for emptiness in the pop method, and 
the optimization using push-pop elimination. These features are included in the 
full version of the algorithm, that we have verified automatically. 

The algorithm uses an array of singly-linked lists (SLLs), one for each thread, 
accessed via the thread-indexed array pools|maxThreads] of pointers to the first 
cell of each list. The init method initializes each of these pointers to null. Each 
list cell contains a data value, a timestamp value, a next pointer, and a boolean 
flag mark which indicates whether the node is logically removed from the stack. 
Each thread pushes elements only to “its own” list, but can pop elements from 
any list. 

A push method for inserting a data element d works as follows: first, a new 
cell with element d and minimal timestamp —1 is inserted at the beginning of 
the list indexed by the calling thread (line 1-3). After that, a new timestamp 
is created and assigned (via the variable t) to the ts field of the inserted cell 
(line 4-5). Finally, the method unlinks (i.e., physically removes) all cells that 
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are reachable (through a sequence of next pointers) from the inserted cell and 
whose mark field is true; these cells are already logically removed. This is done 
by redirecting the next pointer of the inserted cell to the first cell with a false 
mark field, which is reachable from the inserted cell. 

A pop method first traverses all lists, finding in each list the first cell whose 
mark field is false (line 8), and letting the variable youngest point to the most 
recent such cell (i.e., with the largest timestamp) (line 1-11). A compare-and- 
swap (CAS) is used to set the mark field of this youngest cell to true, thereby 
logically removing it. This procedure will restart if the CAS fails. After the 
youngest cell has been removed, the method will unlink all cells, whose mark 
field is true, that appear before (line 17-19) or after (line 20-23) the removed 
cell. Finally, the method returns the data value of the removed cell. 


Fragment Abstraction. In our verification, we establish that the TS stack 
algorithm of Fig. 8 is correct in the sense that it is a linearizable implementation 
of a stack data structure. For stacks and queues, we specify linearizability by 
observers that synchronize on call and return actions of methods, as shown by [7]; 
this is done without any user-supplied annotation, hence the verification is fully 
automated. 

The verification is performed analogously as for skiplists, as described in 
Sect. 4. Here we show how fragment abstraction is used for arrays of singly-linked 
lists. Figure9 shows an example heap state of TS stack. The heap consists of 
a set of singly linked lists (SLLs), each of which is accessed from a pointer in 
the array pools|maxThreads] in a configuration when it is accessed concurrently 
by three threads th;, tha, and thz. The heap consists of three SLLs accessed 
from the three pointers pools[1], pools[2], and pools[3] respectively. Each heap 
cell is shown with the values of its fields, using the layout shown to the right in 
Fig. 9. In addition, each cell is labeled by the pointer variables that point to it. 
We use lvar(i) to denote the local variable lvar of thread thi. 

In the heap state of Fig. 9, thread th, is trying to push a new node with data 
value 4, pointed by its local variable new, having reached line 3. Thread thg has 
just called the push method. Thread thz has reached line 12 in the execution 
of the pop method, and has just assigned youngest to the first node in the list 
pointed to by pools[3] which is not logically removed (in this case it is the last 
node of that list). The observer has two registers x, and x2, which are assigned 
the values 4 and 2, respectively. 

We verify the algorithm using a symbolic representation that is analogous to 
the one used for skiplists. There are two main differences. 


— Since the array pools is global, all threads can reach all lists in the heap (the 
only cells that cannot be reached by all threads are new cells that are not yet 
inserted). 

— We therefore represent the view of a thread by a thread-dependent abstraction 
of thread indices, which index the array pools. In the view of a thread, the 
index of the list where it is currently active is abstracted to me, and all other 
indices are abstracted to ot. The currently active index is taken to be the 
thread index for a thread performing a push, the value of i for a thread 
executing in the for loop of pop, and the value of k after that loop. 
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Fig. 9. A possible heap state of TS stack with three threads. 


In the definition of tags, the only global variables that can occur in the fields 
reachfrom and reachto are therefore pools|me] and pools[other]. The data 
abstraction represents (i) for each cell, the set of observer registers, whose values 
are equal to the datafield, (ii) for each timestamp and observer register x;, the 
possible orderings between this timestamp and the timestamp of an x;-cell. 
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Fig. 10. Fragment abstraction 


Figure 10 shows a set of fragments that is satisfied wrp. to the by the con- 
figuration in Fig. 9. There are 7 fragments, named vı,...,v7. Consider the tag 
which occurs in fragment v7. This tag is an abstraction of the bottom-rightmost 
heap cell in Fig. 9, The different non-pointer fields are represented as follows. 


466 P. A. Abdulla et al. 


— The data field of the tag (to the left) abstracts the data value 2 to the set of 
observer registers with that value: in this case xo. 

— The ts field (at the top) abstracts the timer value 15 to the possible relations 
with ts-fields of heap cells with the same data value as each observer registers. 
Recall that observer registers x; and xə have values 4 and 2, respectively. 
There are three heap cells with data field value 4, all with a ts value less 
than 15. There is one heap cell with data field value 2, having ts value 15. 
Consequently, the abstraction of the ts field maps x; to {>} and x2 to {=}: 
this is the mapping A4 in Fig. 10. 

— The mark field assumes values from a small finite domain and is represented 
precisely as in concrete heap cells. 


Symbolic Postcondition Computation. The symbolic postcondition com- 
putation is similar to that for skiplists. Main differences are as follows. 


— Whenever a thread performing pop moves from one iteration of the for loop 
to the next, the abstraction must consider to swap between the abstractions 
me and ot. 

— In interference steps, we must consider that the abstraction me for the inter- 
fering thread may have to be changed into ot. Furthermore, the abstractions 
me for two push methods cannot coincide, since each thread pushes only to 
its own list. 


6 Experimental Results 


Based on our framework, we have implemented a tool in OCaml, and used it for 
verifying various kinds of concurrent data structures implementation of stacks, 
priority queues, queues and sets. All of them are based on heap structures. There 
are three types of heap structures we consider in our experiments. 


Algorithms Time = Algorithms m ©) 
Treiber stack [35] 18|0.18/O’Hearn set [28] 88) 12 
MS lock-free queue [27] | 22| 21|HM lock-free set [19] 120) 462 
DGLM queue [13] 16} 16|Harris lock-free set [15] 950/1512 
Vechev-CAS set [40] 86! 24/Unordered set [43] 1230/2301 
Vechev-DCAS set [40] 16| 16)/TS stack [11] 176 
Michael lock-free set [25]|178] 110|TS queue [11] 101 
Pessimistic set [19] 30/1.51/Lock-free skiplist [19] 1992 
Optimistic set [19] 25| 60|Lock-based skiplist [18] 500 
Lazy set [17] 34) 289 Priority queue skiplist 1 [23]}1320 
Priority queue skiplist 2 [22] | 599 


Fig. 11. Times for verifying concurrent data structure implementations. Column a 
shows the verification times for our tool based on fragment abstraction. Column b 
shows the verification times for the tool for SLLs in our previous work [3] 
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Singly-linked list benchmarks: These benchmarks include stacks, queues and sets 
algorithms which are the well-known in the literature. The challenge is that in 
some set implementation, the linearization points are not fixed, they depended on 
the future of each execution. The sets with non fixed linearization points are the 
lazy set [20], lock-free sets of HM [22], Harris [17], Michael [29], and unordered set 
of [48]. By using observers and controllers in our previous work [3]. Our approach 
is simple and strong enough to verify these singly-linked list benchmarks. 


Skiplist benchmarks: We consider four skiplist algorithms including the lock- 
based skiplist set [31], the lock-free skiplist set which is described in Sect. 2 [22], 
and two skiplist-based priority queues [26,27]. One challenge for verifying these 
algorithms is to deal with unbounded number of levels. In addition, in the lock- 
free skiplist [22] and priority queue [26], the skiplist shape is not well formed, 
meaning that each higher level list need not be a sub-list of lower level lists. 
These algorithms have not been automatically verified in previous work. By 
applying our fragment abstraction, to the best of our knowledge, we provide first 
framework which can automatically verify these concurrent skiplists algorithms. 


Arrays of singly-linked list benchmarks: We consider two challenging timestamp 
algorithms in [12]. There are two challenges when verifying these algorithm. 
The first challenge is how to deal with an unbounded number of SLLs, and 
the second challenge is that the linearization points of the algorithms are not 
fixed, but depend on the future of each execution. By combining our fragment 
abstraction with the observers for stacks and queues in [7], we are able to ver- 
ify these two algorithms automatically. The observers are crucial for achieving 
automation, since they enforce the weakest possible ordering constraints that 
are necessary for proving linearizability, thereby making it possible to use a less 
precise abstraction. 


Running Times. The experiments were performed on a desktop 2.8 GHz proces- 
sor with 8 GB memory. The results are presented in Fig. 11, where running times 
are given in seconds. Column a shows the verification times of our tool, whereas 
column b shows the verification times for algorithms based on SLLs, using the 
technique in our previous work [3]. In our experiments, we run the tool together 
with an observer in [1,7] and controllers in [3] to verify linearizability of the 
algorithms. All experiments start from the initial heap, and end either when the 
analysis reaches a fixed point or when a violation of safety properties or lineariz- 
ability is detected. As can be seen from the table, the verification times vary 
in the different examples. This is due to the types of shapes that are produced 
during the analysis. For instance, skiplist algorithms have much longer verifica- 
tion times. This is due to the number of pointer variables and their complicated 
shapes. In contrast, other algorithms produce simple shape patterns and hence 
they have shorter verification times. 


Error Detection. In addition to establishing correctness of the original versions 
of the benchmark algorithms, we tested our tool with intentionally inserted bugs. 
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For example, we omitted setting time statement in line 5 of the push method in 
the TS stack algorithm, or we omitted the CAS statements in lock-free algorithms. 
The tool, as expected, successfully detected and reported the bugs. 


7 Conclusions 


We have presented a novel shape abstraction, called fragment abstraction, for 
automatic verification of concurrent data structure implementations that oper- 
ate on different forms of dynamically allocated heap structures, including singly- 
linked lists, skiplists, and arrays of singly-linked lists. Our approach is the first 
framework that can automatically verify concurrent data structure implementa- 
tions that employ skiplists and arrays of singly linked lists, at the same time as 
handling an unbounded number of concurrent threads, an unbounded domain of 
data values (including timestamps), and an unbounded shared heap. We showed 
fragment abstraction allows to combine local and global reachability information 
to allow verification of the functional behavior of a collection of threads. 

As future work, we intend to investigate whether fragment abstraction can 
be applied also to other heap structures, such as concurrent binary search trees. 
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2 


Abstract. Capability machines provide security guarantees at machine 
level which makes them an interesting target for secure compilation 
schemes that provably enforce properties such as control-flow correctness 
and encapsulation of local state. We provide a formalization of a repre- 
sentative capability machine with local capabilities and study a novel 
calling convention. We provide a logical relation that semantically cap- 
tures the guarantees provided by the hardware (a form of capability 
safety) and use it to prove control-flow correctness and encapsulation of 
local state. The logical relation is not specific to our calling convention 
and can be used to reason about arbitrary programs. 


1 Introduction 


Compromising software security is often based on attacks that break program- 
ming language properties relied upon by software authors, such as control-flow 
correctness, local-state encapsulation, etc. Commodity processors offer little sup- 
port for defending against such attacks: they offer security primitives with only 
coarse-grained memory protection and limited compartmentalization scalability. 
As a result, defenses against attacks on control-flow correctness and local-state 
encapsulation are either limited to only certain common forms of attacks (lead- 
ing to an attack-defense arms race) and/or rely on techniques like machine code 
rewriting [1,2], machine code verification [3], virtual machines with a native 
stack [4] or randomization [5]. The latter techniques essentially emulate pro- 
tection techniques on existing hardware, at the cost of performance, system 
complexity and/or security. 

Capability machines are a type of processors that remediate these limitations 
with a better security model at the hardware level. They are based on old ideas [6— 
8], but have recently received renewed interest; in particular, the CHERI project 
has proposed new ideas and ways of tackling practical challenges like backwards 
compatibility and realistic OS support [9, 10]. Capability machines tag every word 
(in the register file and in memory) to enforce a strict separation between num- 
bers and capabilities (a kind of pointers that carry authority). Memory capabilities 
© The Author(s) 2018 
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carry the authority to read and/or write to a range of memory locations. There is 
also a form of object capabilities, which represent the authority to invoke a piece of 
code without exposing the code’s encapsulated private state (e.g., the M-Machine’s 
enter capabilities or CHERI’s sealed code/data pairs). 

Unlike commodity processors, capability machines lend themselves well to 
enforcing local-state encapsulation. Potentially, they will enable compilation 
schemes that enforce this property in an efficient but also 100% watertight way 
(ideally evidenced by a mathematical proof, guaranteeing that we do not end up 
in a new attack-defense arms race). However, a lot needs to happen before we get 
there. For example, it is far from trivial to devise a compilation scheme adapted 
to the details of a specific source language’s notion of encapsulation (e.g., private 
member variables in OO languages often behave quite differently than private 
state in ML-like languages). And even if such a scheme were defined, a formal 
proof depends on a formalization of the encapsulation provided by the capability 
machine at hand. 

A similar problem is the enforcement of control-flow correctness on capability 
machines. An interesting approach is taken in CheriBSD [9]: the standard con- 
tiguous C stack is split into a central, trusted stack, managed by trusted call and 
return instructions, and disjoint, private, per-compartment stacks. To prevent 
illegal use of stack references, the approach relies on local capabilities, a type of 
capabilities offered by CHERI to temporarily relinquish authority, namely for 
the duration of a function invocation whereafter the capability can be revoked. 
However, details are scarce (how does it work precisely? what features are sup- 
ported?) and a lot remains to be investigated (e.g., combining disjoint stacks with 
cross-domain function pointers seems like it will scale poorly to large numbers 
of components?). Finally, there is no argument that the approach is watertight 
and it is not even clear what security property is targeted exactly. 

In this paper, we make two main contributions: (1) an alternative calling 
convention that uses local capabilities to enforce stack frame encapsulation and 
well-bracketed control flow, and (2) perhaps more importantly, we adapt and 
apply the well-studied techniques of step-indexed Kripke logical relations for 
reasoning about code on a representative capability machine with local capabili- 
ties in general and correctness and security of the calling convention in particular. 
More specifically, we make the following contributions: 


— We formalize a simple but representative capability machine featuring local 
capabilities and its operational semantics (Sect. 2). 

— We define a novel calling convention enforcing control-flow correctness and 
encapsulation of stack frames (Sect. 3). It relies solely on local capabilities and 
does not require OS support (like a trusted stack or call/return instructions). 
It supports higher-order cross-component calls (e.g., cross-component function 
pointers) and can be efficient assuming only one additional piece of processor 
support: an efficient instruction for clearing a range of memory. 

— We present a novel step-indexed Kripke logical relation for reasoning about 
programs on the capability machine. It is an untyped logical relation, inspired 
by previous work on object capabilities [11]. We prove an analogue of the 
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standard fundamental theorem of logical relations—to the best of our knowl- 
edge, our theorem is the most general and powerful formulation of the formal 
guarantees offered by a capability machine (a form of capability safety [11,12]), 
including the specific guarantees offered for local capabilities. It is very general 
and not tied to our calling convention or a specific way of using the system’s 
capabilities. We are the first to apply these techniques for reasoning about 
capability machines and we believe they will prove useful for many other pur- 
poses than our calling convention. 

— We introduce two novel technical ideas in the unary, step-indexed Kripke log- 
ical relation used to formulate the above theorem: the use of a single orthogo- 
nal closure (rather than the earlier used biorthogonal closure) and a variant of 
Dreyer et al. [13]’s public and private future worlds [13] to express the special 
nature of local capabilities. The logical relation and the fundamental theorem 
expressing capability safety are presented in Sect. 4. 

— We demonstrate our results by applying them to challenging examples, specif- 
ically constructed to demonstrate local-state encapsulation and control-flow 
correctness guarantees in the presence of cross-component function pointers 
(Sect.5). The examples demonstrate both the power of our formulation of 
capability safety and our calling convention. 


For reasons of space, some details and all proofs have been omitted; please 
refer to the technical appendix [14] for those. 


2 <A Capability Machine with Local Capabilities 


In this paper, we work with a formal capability machine with all the char- 
acteristics of real capability machines, as well as local capabilities much like 
CHERIT’s. Otherwise, it is kept as simple as possible. It is inspired by both the 
M-Machine [6] and CHERI [9]. To avoid uninteresting details, we assume an 
infinite address space and unbounded integers. 


We define the syntax of our capability machine in RWLX 
Fig. 1. We assume an infinite set of addresses Addr and LX 
define machine words as either integers or capabilities RWL RWX 
of the form ((perm, g), base, end, a). Such a capability we hm 
represents the authority to execute permissions perm on LZ N 
the memory range |base, end], together with a current RO E 
address a and a locality tag g indicating whether the Xo o 
capability is global or local. There is no notion of point- o 
ers other than capabilities, so we will use the terms inter- 
changeably. The available permissions and their order- Fig. 3. Permission 


ing are depicted in Fig. 3: the permissions include null hierarchy 

permission (O), readonly (RO), read/write (Rw), read/execute (RX) and read- 
/write/execute (RWX) permissions. Additionally, there are three special permis- 
sions: read/write-local (RWL), read/write-local/execute (RWLX) and enter (E), 
which we will explain below. 


478 L. Skorstengaard et al. 


ac Addr “= N r € RegName p= pe|roļrij... 
we Word © Z+ Cap reg € Reg x RegName — Word 
perm € Perm ::= O | RO | RW | RWL | mE Mem x Addr — Word 
RX | E | RWX | RWLX ® € ExecConf x Reg x Mem 
g € Global ::= global | local ms E€ MemSeg := Addr — Word 


Conf ::= ExecConf + {failed} + {halted} x Mem 
Cap ::= {((perm, g), b, e, a) | b,a € Addr, e € Addr U {00}} 


r € Z+RegName 

i ::= jmpr|jnzrr|moverr|loadrr|storerr|plusrrr|minusrrr | 
ltrrr|learr|restrictrr|subsegrrr|isptrrr|getlrr | 
getprr|getbrr|geterr|getarr|fail|halt 


Fig. 1. The syntax of our capability machine assembly language. 
[decode(n)] (5) if ®.reg(pc) = ((perm, g),b,e,a) and b<a<e 


p — and perm € {RX, RWX, RWLX} and .mem(a) = n 


failed otherwise 


P|reg.pc — newPc] if .reg(pc) = ((perm, g), b, e, a) 


updPc(®) = and newPc = ((perm, g), 6, e,a+1) 
failed otherwise 
i [i] (8) Conditions 
fail Jailed 
halt (halted, P.mem) 


move rı r2 |updPc(P[reg.r1 œ w]) |r2 € Reg > w = Greg(r2) and r2 E€ Z > w = r2 
load rı r2 |updPc(®[reg.r1 =œ w]) |@.reg(r2) = ((perm, g), b, e,a) and w = &.mem(a) 
and b < a < e and 
perm € {RWX, RWLX, RX, RW, RWL, RO} 
restrict rı r2|updPc(®[reg.r1 œ w]) |Ð.reg(r2) = ((perm, g), b, e, a) and 
(perm’, g') = decodePermPair(@.reg(r2)) and 
(perm',g') E (perm, g) and w = ((perm’,g'), b, e, a) 
getarı r2 |updPc(Plreg.ri +> al) |B.reg(r2) = ((-,-),-,-, a) 

jmpr P|reg.pc +> newPc] if .reg(r) = ((E, g), b, e, a), then 
newPc = ((RX, g), b, e, a) otherwise 
newPc = ®.reg(r) 
store rı r2 |updPc(®[mem.a+> w]) |@.reg(ri) = ((perm, g), b, e, a) and 
perm € {RWX, RWLX, RW, RWL} and b < a < e and 
w = @.reg(r2) and if w = ((-, local), -, -, -), then 
perm € {RWLX, RWL} 


- Jailed otherwise 


Fig. 2. An excerpt from the operational semantics. 


We assume a finite set of register names RegName. We define register files 
reg and memories ms as functions mapping register names resp. addresses to 
words. The state of the entire machine is represented as a configuration that is 
either a running state ® € ExecConf containing a memory and a register file, or 
a failed or halted state, where the latter keeps hold of the final state of memory. 
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The machine’s instruction set is rather basic. Instructions 7 include relatively 
standard jump (jmp), conditional jump (jnz) and move (move, copies words 
between registers) instructions. Also familiar are load and store instructions for 
reading from and writing to memory (load and store) and arithmetic addition 
operators (1t (less than), plus and minus, operating only on numbers). There are 
three instructions for modifying capabilities: Lea (modifies the current address), 
restrict (modifies the permission and local/global tag) and subseg (modifies 
the range of a capability). Importantly, these instructions take care that the 
resulting capability always carries less authority than the original (e.g. restrict 
will only weaken a permission). Finally, the instruction isptr tests whether a 
word is a capability or a number and instructions getp, getl, getb, gete and 
geta provide access to a capability’s permissions, local/global tag, base, end and 
current address, respectively. 

Figure 2 shows an excerpt of the operational semantics for a few representa- 
tive instructions. Essentially, a configuration ® either decodes and executes the 
instruction at ®.reg(pc) if it is executable and its address is in the valid range 
or otherwise fails. The table in the figure shows for instructions 7 the result of 
executing them in configuration ®. fail and halt obviously fail and halt respec- 
tively. move simply modifies the register file as requested and updates the pc to 
the next instruction using the meta-function updPc. 

The load instruction loads the contents of the requested memory location 
into a register, but only if the capability has appropriate authority (i.e. read 
permission and an appropriate range). restrict updates a capability’s permis- 
sions and global/local tag in the register file, but only if the new permissions are 
weaker than the original. It also never turns local capabilities into global ones. 
geta queries the current address of a capability and stores it in a register. 

The jmp instruction updates the program counter to a requested location, 
but it is complicated by the presence of enter capabilities, modeled after the 
M-Machine’s [6]. Enter capabilities cannot be used to read, write or execute and 
their address and range cannot be modified. They can only be used to jump 
to, but when that happens, their permission changes to RX. They can be used 
to represent a kind of closures: an opaque package containing a piece of code 
together with local encapsulated state. Such a package can be built as an enter 
capability c = ((E,g),6,e,a@) where the range [b,a — 1] contains local state 
(data or capabilities) and [a, e] contains instructions. The package is opaque 
to an adversary holding c but when c is jumped to, the instructions can start 
executing and have access to the local data through the updated version of c 
that is then in pe. 

Finally, the store instruction updates the memory to the requested value 
if the capability has write authority for the requested location. However, the 
instruction is complicated by the presence of local capabilities, modeled after 
the ones in the CHERI processor [9]. Basically, local capabilities are special in 
that they can only be kept in registers, i.e. they cannot be stored to memory. 
This means that local capabilities can be temporarily given to an adversary, for 
the duration of an invocation: if we take care to clear the capability from the 
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register file after control is passed back to us, they will not have been able to 
store the capability. However, there is one exception to the rule above: local 
capabilities can be stored to memory for which we have a capability with write- 
local authority (i.e. permission RWL or RWLX). This is intended to accommodate 
a stack, where register contents can be stored, including local capabilities. As 
long as all capabilities with write-local authority are themselves local and the 
stack is cleared after control is passed back by the adversary, we will see that 
this does not break the intended behavior of local capabilities. 

We point out that our local capabilities capture only a part of the semantics 
of local capabilities in CHERI. Specifically, in addition to the above, CHERI’s 
default implementation of the CCall exception handler forbids local capabilities 
from being passed across module boundaries. Such a restriction fundamentally 
breaks our calling convention, since we pass around local return pointers and 
stack capabilities. However, CHERI’s CCall is not implemented in hardware, 
but in software, precisely to allow experimenting with alternative models like 
ours. 

In order to have a reasonably realistic system, we use a simple model of 
linking where a program has access to a linking table that contains capabilities 
for other programs. We also assume malloc to be part of the trusted computing 
base satisfying a certain specification. Malloc and linking tables are described 
further in the next section, but we refer to the technical appendix [14] for full 
details. 


3 Stack and Return Pointer Management Using Local 
Capabilities 


One of the contributions in this paper is a demonstration that local capabilities 
on a capability machine support a calling convention that enforces control-flow 
correctness in a way that is provably watertight, potentially efficient, does not 
rely on a trusted central stack manager and supports higher-order interfaces to an 
adversary, where an adversary is just some unknown piece of code. In this section, 
we explain this convention’s high-level approach, the security measures to be 
taken in a number of situations (motivating each separately with a summary 
table at the end). After that, we define a number of reusable macro-instructions 
that can be used to conveniently apply the proposed convention in subsequent 
examples. 

The basic idea of our approach is simple: we stick to a single, rather stan- 
dard, C stack and register-passed stack and return pointers, much like a standard 
C calling convention. However, to prevent various ways of misusing this basic 
scheme, we put local capabilities to work and take a number of not-always- 
obvious safety measures. The safety measures are presented in terms of what we 
need to do to protect ourselves against an adversary, but this is only for presen- 
tation purposes as our code assumes no special status on the machine. In fact, 
an adversary can apply the same safety measures to protect themselves against 
us. In the next paragraphs, we will explain the issues to be considered in all the 


Reasoning About a Machine with Local Capabilities 481 


relevant situations: when (1) starting our program, (2) returning to the adver- 
sary, (3) invoking the adversary, (4) returning from the adversary, (5) invoking 
an adversary callback and (6) having a callback invoked by the adversary. 


Program Start-Up. We assume that the language runtime initializes the mem- 
ory as follows: a contiguous array of memory is reserved for the stack, for which 
we receive a stack pointer in a special register frst. We stress that the stack 
is not built-in, but merely an abstraction we put on this piece of the memory. 
The stack pointer is local and has RWLX permission. Note that this means that 
we will be placing and executing instructions on the stack. Crucially, the stack 
is the only part of memory for which the runtime (including malloc, loading, 
linking) will ever provide RWLX or RWL capabilities. Additionally, our examples 
typically also assume some memory to store instructions or static data. Another 
part of memory (called the heap) is initially governed by malloc and at program 
start-up, no other code has capabilities for this memory. Malloc hands out RWX 
capabilities for allocated regions as requested (no RWLX or RWL permissions). For 
simplicity, we assume that memory allocated through malloc cannot be freed. 


Returning to the Adversary. Perhaps the simplest situation is returning to 
the adversary after they invoked our code. In this case, we have received a return 
pointer from them, and we just need to jump to it as usual. An obvious security 
measure to take care of is properly clearing the non-return-value registers before 
we jump (since they may contain data or capabilities that the adversary should 
not get access to). Additionally, we may have used the stack for various purposes 
(register spilling, storing local state when invoking other functions etc.), so we 
also need to clear that data before returning to the adversary. 

However, if we are returning from a function that has itself invoked adversary 
code, then clearing the used part of the stack is not enough. The unused part 
of the stack may also contain data and capabilities, left there by the adversary, 
including local capabilities since the stack is write-local. As we will see later, we 
rely on the fact that the adversary cannot keep hold of local capabilities when 
they pass control to the trusted code and receive control back. In this case, the 
adversary could use the unused part of the stack to store local pointers and load 
them from there after they get control back. To prevent this, we need to clear 
(i.e. overwrite with zeros) the entire part of the stack that the adversary has 
had access to, not just the parts that we have used ourselves. Since we may be 
talking about a large part of memory, this requirement is the most problematic 
aspect of our calling convention for performance, but see Sect.6 for how this 
might be mitigated. 


Invoking the Adversary. A slightly more complex case is invoking the adver- 
sary. As above, we clear all the non-argument registers, as well as the part of 
the stack that we are not using (because, as above, it may contain local capabil- 
ities from previously executed code that the adversary could exploit in the same 
way). We leave a copy of the stack pointer in rstk, but only after we have used 
the subseg instruction to shrink its authority to the part that we are not using 
ourselves. 
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In one of the registers, we also provide a return pointer, which must be a 
local capability. If it were global, the adversary would be able to store away the 
return pointer in a global data structure (i.e. there exists a global capability for 
it), and jump to it later, in circumstances where this should not be possible. 
For example, they could store the return pointer, legally jump to it a first time, 
wait to be invoked again and then jump to the old return pointer a second time, 
instead of the new return pointer received for the second invocation. Similarly, 
they could store the return pointer, invoke a function in our code, wait for us 
to invoke them again and then jump to the old return pointer rather than the 
new one, received for the second invocation. By making the return pointer local, 
we prevent such attacks: the adversary can only store local capabilities through 
write-local capabilities, which means (because of our assumptions above): on the 
stack. Since the stack pointer itself is also local, it can also only be stored on 
the stack. Because we clear the part of the stack that the adversary has had 
access to before we pass control back, there is no way for them to recover either 
of these local capabilities. 

Note that storing stack pointers for use during future invocations would also 
be dangerous in itself, i.e. not just because it can be used to store return pointers. 
Imagine the adversary stores their stack pointer, invokes trusted code that uses 
part of the stack to store private data and then invokes the adversary again 
with a stack pointer restricted to exclude the part containing the private data. 
If the adversary had a way of keeping hold of their old stack pointer, it could 
access the private data stored there by the trusted code and break local-state 
encapsulation. 


Returning from the Adversary. So return pointers must be passed as local 
capabilities. But what should their permissions be, what memory should they 
point to and what should that memory (the activation record) contain? Let 
us answer the last question first by considering what should happen when the 
adversary jumps to a return pointer. In that case, the program counter should 
be restored to the instruction after the jump to the adversary, so the activation 
record should store this old program counter. Additionally, the stack pointer 
should also be restored to its original value. Since the adversary has a more 
restricted authority over the stack than the code making the call, we cannot 
hope to reconstruct the original stack pointer from the stack pointer owned by 
the adversary. Instead, it should be stored as part of the activation record. 
Clearly, neither of these capabilities should be accessible by the adversary. 
In other words, the return pointer provided to the adversary must be a capabil- 
ity that they can jump to but not read from, i.e. an enter capability. To make 
this work, we construct the activation record as depicted in Fig. 4. The E return 
pointer has authority over the entire activation record (containing the previous 
return and stack pointer), and its current address points to a number of restore 
instructions in the record, so that upon invocation, these instructions are exe- 
cuted and can load the old stack pointer and program counter back into the 
register file. As the return pointer is an enter pointer, the adversary cannot get 
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; E 
return pointer ———> 
restore instructions 


previous program counter 
previous stack pointer 


Fig. 4. Structure of an activation record 


hold of the activation record’s contents, but after invocation, its permission is 
updated to RX, so the contents become available to the restore instructions. 
The final question that remains is: where should we store this activation 
record? The attentive reader may already see that there is only one possibility: 
since the activation record contains the old stack pointer, which is local, the 
activation record can only be constructed in a part of memory where we have 
write-local access, i.e. on the stack. Note that this means we will be placing and 
executing instructions on the stack, i.e. it will not just contain code pointers 
and data. This means that our calling convention should be combined with 
protection against stack smashing attacks (i.e. buffer overflows on the stack 
overwriting activation records’ contents). Luckily, the capability machine’s fine- 
grained memory protection should make it reasonably easy for a compiler to 
implement such protection, by making sure that only appropriately bounded 
versions of the stack pointer are made available to source language code. 


Invoking an Adversary Callback. If we have a higher-order interface to the 
adversary, we may need to invoke an adversary callback. In this case, not so 
much changes with respect to the situation where we invoke static adversary 
code. The adversary can provide a callback as a capability for us to jump to, 
either an E-capability if they want to protect themselves from us or just an RX 
capability if they are not worried about that. However, there is one scenario that 
we need to prevent: if they construct the callback capability to point into the 
stack, it may contain local capabilities that they should not have access to upon 
invocation of the callback. As before, this includes return and stack pointers 
from previous stack frames that they may be trying to illegally use inside the 
callback. 

To prevent this, we only accept callbacks from the adversary in the form 
of global capabilities, which we dynamically check before invoking them (and 
we fail otherwise). This should not be an overly strict requirement: our own 
callbacks do not contain local data themselves, so there should be no need for 
the adversary to construct callbacks on the stack.! 


Having a Callback Invoked by the Adversary. The above leaves us with 
perhaps the hardest scenario: how to provide a callback to the adversary. The 


1 Note that it does prevent a legitimate but non-essential scenario where the adversary 
wants to give us temporary access to a callback not allocated on the stack. 
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basic idea is that we allocate a block of memory using malloc that we fill with the 
capabilities and data that the callback needs, as well as some prelude instructions 
that load the data into registers and jumps to the right code. Note that this 
implies that no local capabilities can be stored as part of a closure. We can 
then provide the adversary with an enter-capability covering the allocated block 
and pointing to the contained prelude instructions. However, the question that 
remains in this setup is: from where do we get a stack pointer when the callback 
is invoked? 

Our answer is that the adversary should provide it to us, just as we provide 
them with a stack pointer when we invoke their code. However, it is important 
that we do not just accept any capability as a stack pointer but check that 
it is safe to use. Specifically, we check that it is indeed an RWLX capability. 
Without this check, an adversary could potentially get control over our local 
stack frame during a subsequent callback by passing us a local RWX capability 
to a global data structure instead of a proper stack pointer and a global callback 
for our callback to invoke. If our local state contains no local capabilities, then, 
otherwise following our calling convention, the callback would not fail and the 
adversary could use a stored capability for the global data structure to access 
our local state. To prevent this from happening, we need to make sure the stack 
capability carries RWLX authority, since the system wide assumption then tells 
us that the adversary cannot have global capabilities to our local stack. 


Calling Convention. With the security measures introduced and motivated, 
let us summarize our proposed calling convention: At program start-up A local 
RWLX stack pointer resides in register rstk. No global write-local capabilities. 
Before returning to the adversary Clear non-return-value registers. Clear the 
part of the stack we had access to (not just the part we used). Before invoking 
the adversary Push activation record to the stack. Create return pointer as local 
E-capability to the instructions in the record. Restrict the stack capability to 
the unused part and clear it. Clear non-argument registers. Before invoking an 
adversary callback Make sure callback is global. When invoked by an adversary 
Make sure received stack pointer has permission RWLX. 


Reusable Macro Instructions. We define a number of reusable macros cap- 
turing the calling convention and other conveniences. All macros that use the 
stack assume a stack pointer in register rs:,. The macro fetch r name fetches 
the capability related to name from the linking table and stores it in register 
r. The macros push r and pop r add and remove elements from the stack. The 
macro prepstk r is used when a callback is invoked by the adversary and pre- 
pares the received stack pointer by checking that it has permission RWLX. The 
macro scall 7 (fargs »Tpriv) Jumps to the capability in register r in the manner 
described above. That is, it pushes local state (the contents of registers Tpriv) 
and the activation record (return code, return pointer, stack pointer) to the 
stack, creates an E return pointer, restricts the stack pointer, clears the unused 
part of the stack, clears the necessary registers and jumps to r. Upon return, the 
private state is restored. The macro mclear r clears all the memory the capa- 
bility in register r has authority over. The macro rclear regSet clears all the 
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registers in regSet. The macro reqglob r checks whether the word in register r 
is a global capability. The macro crtcls (x;,7;) r allocates a closure where r 
points to the closure’s code and a new environment is allocated (using malloc) 
where the contents of T; is stored. In the code referred to by r, an implicit fetch 
happens when an instruction refers to zi. 

The technical appendix [14] contains detailed descriptions of all the macros. 


4 Logical Relation 


In this section, we formalize the guarantees provided by the capability machine, 
including the specific guarantees for local capabilities, by means of a step-indexed 
Kripke logical relation with recursively defined worlds. We use the logical rela- 
tion in the following section to show local-state encapsulation and control-flow 
integrity properties for challenging example programs. 


4.1 Worlds 


A world is a finite map from region names, modeled as natural numbers, to 
regions that each correspond to an invariant of part of the memory. We have 
three types of regions: permanent, temporary, and revoked. Each permanent and 
temporary region contains a state transition system, with public and private 
transitions, to describe how the invariants are allowed to change over time. In 
other words, they are protocols for the region’s memory. These are similar to 
what has been used in logical relations for high-level languages [11,13,15]. Pro- 
tocols imposed by permanent regions stay in place indefinitely. Any capability, 
local or global, can depend on these protocols. Protocols imposed by temporary 
regions can be revoked in private future worlds. Doing this may break the safety 
of local capabilities but not global ones. This means that local capabilities can 
safely depend on the protocols imposed by temporary regions, but global capa- 
bilities cannot, since a global capability may outlive a temporary region that is 
revoked. This is illustrated in Fig. 5. 


Permanent region: 
| a) 
Temporary region: 


Local capability: 


Global capability: 


Fig. 5. The relation between local/global capabilities and temporary/permanent 
regions. The colored fields are regions governing parts of memory. Global capabilities 
cannot depend on temporary regions. 
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For technical reasons, we do not actually remove a revoked temporary region 
from the world, but we turn it into a special revoked region that exists for this 
purpose. Such a revoked region contains no state transition system and puts no 
requirements on the memory. It simply serves as a mask for a revoked temporary 
region. Masking a region like this goes back to earlier work of Ahmed [16] and 
was also used by Birkedal et al. [17]. 

Regions are used to define safe memory segments, but this set may itself be 
world-dependent. In other words, our worlds are defined recursively. Recursive 
worlds are common in Kripke models and the following lemma uses the method 
of Birkedal and Bizjak [18]; Birkedal et al. [19] for constructing them. The for- 
mulation of the lemma is technical, so we recommend that non-expert readers 
ignore the technicalities and accept that there exists a set of worlds Wor and 
two relations J?” and IP”? satisfying the (recursive) equations in the theorem 
(where the > operator can be safely ignored). 


Theorem 1. There exists a c.o.f.e. (complete ordered family of equivalences) 
Wor and preorders 2?" and 2?“ such that (Wor, 2?™’) and (Wor, DP?) are 
preordered c.o.f.e.’s, and there exists an isomorphism € such that 


E : Wor = »(N * Region) 
Region = {revoked} 
{temp} x State x Rels x (State = (Wor =S, UPred(MemSeg)))W 


ae 
{perm} x State x Rels x (State + (Wor ——> UPred(MemSeg))) 


priv 


Ww’ Ser Ww yan E(w’) gp &(W) 
Ww’ JP Ws E(w’) Pub é(W) 


and for W,W’' € Wor. 


In the above theorem, State x Rels corresponds to the aforementioned state tran- 
sition system where Rels contains pairs of relations corresponding to the public 
and private transitions, and State is an unspecified set that we assume to contain 
at least the states we use in this paper. The last part of the temporary and per- 
manent regions is a state interpretation function that determines what memory 
segments the region permits in each state of the state transition system. The 
different monotonicity requirements in the two interpretation functions reflects 
how permanent regions rely only on permanent protocols whereas temporary 
regions can rely on both temporary and permanent protocols. UPred(MemSeg) 
is the set of step-indexed, downwards closed predicates on memory segments: 
UPred(MemSeg) = {A C N x MemSeg | V(n, ms) € A.Ym < n.(m,ms) € A}. 

With the recursive domain equation solved, we could take Wor as our notion 
of worlds, but it is technically more convenient to work with the following defi- 
nition instead: 


World = N £ Region 
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Future Worlds. The future world relations model how memory may evolve over 
time. The public future world W’ P”? W requires that dom(W’) D dom(W) and 
Vr € dom(W).W'(r) 2?”° W(r). That is, in a public future world, new regions 
may have been allocated, and existing regions may have evolved according to the 
public future region relation (defined below). The private future world relation 
W’ 2?" W is defined similarly, using a private future region relation. The public 
future region relation is the simplest. It satisfies the following properties: 


(s, s') = Ppub (temp, S, Ọpub, Q, H) € Region 
(v, S, Ọpub, Q, H) een (v, S, Ọpub, $, H) (temp, S, Ppubs Q, H) ape? revoked 


pub revoked 


revoked 1 


Both temporary and permanent regions are only allowed to transition according 
to the public part of their transition system. Additionally, revoked regions must 
either remain revoked or be replaced by a temporary region. This means that 
the public future world relations allows us to reinstate a region that has been 
revoked earlier. The private future region relation satisfies: 


(s,s')Eo r € Region 
(v, S, Ppubs d, H) pe (v, S, Ppubs Q, H) r ayer (temp, S, Ppubs d, H) 
r € Region 
r 2?" revoked 


Here, revocation of temporary regions is allowed. In fact, temporary regions 
can be replaced by an arbitrary other region, not just the special revoked. Con- 
versely, revoked regions may also be replaced by any other region. On the other 
hand, permanent regions cannot be masked away. They are only allowed to tran- 
sition according to the private part of the transition system. 

Notice that the public future region relation is a subset of the private future 
region relation. 


World Satisfaction. A memory satisfies a world, written ms :n W, if it can 
be partitioned into disjoint parts such that each part is accepted by an active 
(permanent or temporary) region. Revoked regions are not taken into account 
as their memory protocols are no longer in effect. 


JP : active(W) — MemSeg. ms = l P(r) and 


r€active(W) 


MS in W iff Vr € active(W). 
3H, s.W(r) = (s, H) and (n,P(r)) € H(s)(€-'(W)) 
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O : World “> UPred(Reg x MemSeg) 
Vs, mem',i < n. (reg, ms © mss) >; (halted, mem’) > 
O(W) = 4 (n,(reg,ms)) | SW! 3P™ W, msn, ms’. 
mem’ = ms’ J ms, W mss and ms’ in-i W’ 


———> UPred(Reg) 


R : World 12, 


-pub 


R(W) # {(n, reg) | Yr € RegName \ {pc}. (n, reg(r)) € V(W)} 


E : World = UPred(Word) 


as Sin pe Yn' <n, (n', reg) E R(W), ms in W. 
a a be a eaycoun) 


mon, ne 


V : World ——> UPred(Word) 


vw) 2 ti i) | i € Z} U {(n, ((0, g), b, e, a)) FU 
T (RW, g), b, e, a) (n, (b, e)) € readCond(g)(W) and u 
{ 


(n, (b, e)) € writeCond(."’, g)(W) 

(n, ((E, g), b, e,a)) | (n, (b, e, a)) E€ enterCond(g)(W)}U 

(n, (b, e)) € readCond(g)(W) and 

(n, (b, e)) € writeCond(P™!, g)(W) and 

(n, ({RWLX, RWX, RX}, b, e)) € execCond(g)(W) 


ka (RWLX, g), b, e, a)) 


. and so on for permissions RO, RWL, RX, and RWX. 


Fig. 6. The logical relation. 


4.2 Logical Relation 


The logical relation defines semantically when values, program counters, and 
configurations are capability safe. The definition is found in Figs. 6 and 7 and 
we provide some explanations in the following paragraphs. For space reasons, 
we omit some definitions and explain them only verbally, but precise definitions 
can be found in the technical appendix [14]. 

First, the observation relation O defines what configurations we consider 
safe. A configuration is safe with respect to a world, when the execution of 
said configuration does not break the memory protocols of the world. Roughly 
speaking, this means that when the execution of a configuration halts, then there 
is a private future world that the resulting memory satisfies. Notice that failing 
is considered safe behavior. In fact, the machine often resorts to failing when 
an unauthorized access is attempted, such as loading from a capability without 
read permission. This is similar to Devriese et al. [11]’s logical relation for an 
untyped language, but unlike typical logical relations for typed languages, which 
require that programs do not fail. 

The register-file relation R defines safe register-files as those that contain 
safe words (i.e. words in V) in all registers but pc. The expression relation E 
defines that a word is safe to use as a program counter if it can be plugged into 
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3b’, e'] 2 [b, e]. W(r) Š ee 


1 1 
b’,e 


readCond(g)(W) = fo (b, e)) 


Ir € localityReg(g, W). ) 


Ir € localityReg(g, W). 
writeCond(t, g)(W) = $ (n,(b, e)) | W(r) is address-stratified and 
[b', e'] 2 [b, e]. W (r) "2 ive 
Yn’ < n, W’ I W,a € |b, e], perm € P. 
(n',((perm, g), b, e, a)) E E(W’) 
Yn’ <n.VW' I W. \ 
(n’, ((RX, g), b, e, a)) € E(W’) 


priv 


execCond(g)(W) = fon (P,b,e)) 


enterCond(g)(W) = fon (b, e, a)) 


pu 


> and g = global > J = 


where g = local > J = J 


Fig. 7. Permission-based conditions 


a safe register file (i.e. a register file in R) and paired with a memory satisfying 
the world to become a safe configuration. Note that integers and non-executable 
capabilities (e.g. RO and E capabilities) are considered safe program counters 
because when they are plugged into a register file and paired with a memory, 
the execution will immediately fail, which is safe. 

The value relation V defines when words are safe. We make the value relation 
as liberal as possible by considering what is the most we can allow an adversary to 
use a capability for without breaking the memory protocols. Non-capability data 
is always safe because it provides no authority. Capabilities give the authority 
to manipulate memory and potentially break memory protocols, so they need 
to satisfy certain conditions to be safe. In Fig. 7, we define such a condition for 
each kind of permission a capability can have. 

For capabilities with read permission, the readCond ensures that it can only 
be used to read safe words, i.e. words in the value relation. To guarantee this, we 
require that the addressed memory is governed by a region W (r) that imposes 
safety as a requirement on the values contained. This safety requirement is for- 
mulated in terms of a standard region ie The definition of that standard region 
is omitted for space reasons, but it simply requires all the words in the range 
[b, e] to be safe, i.e. in the value relation. Requiring that W (r) a ger means that 
W (r) must accept only safe values like .?”', but can be even more restrictive if 
desired. The read condition also takes into account the locality of the capability 
because, generally speaking, global capabilities should only depend on perma- 
nent regions. Concretely, we use the function localityReg(g,W), which projects 
out all active (non-revoked) regions when the locality g is local, but only the 
permanent regions when g is global. The definition of the standard region 1?! 
can be found in [14]; it makes use of the isomorphism from Theorem 1. 

For a capability with write permission, writeCond must be satisfied for the 
capability’s range of authority. An adversary can use such a capability to write 
any word they can get a hold of, and we can safely assume that they can only 
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get a hold of safe words, so the region governing the relevant memory must allow 
any safe word to be written there. In order to make the logical relation as liberal 
as possible, we make this a lower bound of what the region may allow. For write 
capabilities, we also have to take into account the two flavours of write per- 
missions: write and write-local. In the case of write-local capabilities, the region 
needs to allow (at least) any safe word to be written, but in the case of write capa- 
bilities, the capability cannot be used to write local capabilities, so the region 
only needs to allow safe non-local values. In the write condition, this is handled 
by parameterizing it with a region. For the write-local capabilities the write 
condition is applied with the standard region gel that we described previously. 


For the write capabilities we use a different standard region piui which requires 


that the words in [b, e] are non-local and safe. As before, we use localityReg to 
pick an appropriate region based on the capability’s locality. Finally, there is a 
technical requirement that the region must be address-stratified. Intuitively, this 
means that if a region accepts two memory segments, then it must also accept 
every memory segment “in between”, that is every memory segment where each 
address contains a value from one of the two accepted memory segments. An 
interesting property of the write condition is that they prohibit global write- 
local capabilities which, as discussed in Sect. 3, is necessary for any safe use of 
local capabilities. 

The conditions enterCond and execCond are very similar. Both require that 
the capability can be safely jumped to. However, executable capabilities can be 
updated to point anywhere in their range, so they must be safe as a program 
counter (in the €-relation) no matter the current address. In contrast, enter 
capabilities are opaque and can only be used to jump to the address they point 
to. They also change permission when jumped to, so we require them to be 
safe as a program counter after the permission is changed to RX. Because the 
capabilities are not necessarily invoked immediately, this must be true in any 
future world, but it depends on the capability’s locality which future worlds we 
consider. If it is global, then we require safety as a program counter in private 
future worlds (where temporary regions may be revoked). For local capabilities, 
it suffices to be safe in public future worlds, where temporary regions are still 
present. 

In the technical appendix, we prove that safety of all values is preserved in 
public future worlds, and that safety of global values is also preserved in private 
future worlds: 


Lemma 1 (Double monotonicity of value relation) 


- If W' 2?" W and (n,w) € V(W), then (n, w) € V(W’). 
- If W' 2? W and (n,w) E V(W) and w = ( (perm, global), b, e, a) (i.e. w is 
a global capability), then (n, w) € V(W’'). 


4.3 Safety of the Capability Machine 


With the logical relation defined, we can now state the fundamental theorem 
of our logical relation: a strong theorem that formalizes the guarantees offered 
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by the capability machine. Essentially, it says a capability that only grants safe 
authority is capability safe as a program counter. 


Theorem 2 (Fundamental theorem). If one of the following holds: 


e perm = RX and (n,(b, e)) € readCond(g)(W) 

e perm = RWX and (n,(b,e)) € readCond(g)(W) and 
(n, (b, e)) € writeCond(.””', g)(W) 

e perm = RWLX and (n, (b, e)) € readCond(g)(W) and 
(n, (b, e)) € writeCond(i?”!, g)(W), 


then (n, ((perm, g), b, e,a)) E€ E(W) 


The permission based conditions of Theorem 2 make sure that the capability only 
provides safe authority in which case the capability must be in the € relation, 
i.e. it can safely be used as a program counter in an otherwise safe register-file. 

The Fundamental Theorem can be understood as a general expression of the 
guarantees offered by the capability machine, an instance of a general property 
called capability safety [11,12]. To understand this, consider that the theorem 
says the capability ((perm, g), b, e, a) is safe as a program counter, without any 
assumption about what instructions it actually points to (the only assumptions 
we have are about the read or write authority that it carries). As such, the the- 
orem expresses the capability safety of the machine, which guarantees that any 
instruction is fine and will not be able to go beyond the authority of the values 
it has access to. We demonstrate this in Sect.5 where Theorem 2 is used to rea- 
son about capabilities that point to arbitrary instructions. The relation between 
Theorem 2 and local-state encapsulation and control-flow correctness, will also 
be shown by example in Sect. 5 as the examples depend on these properties for 
correctness. See the technical appendix [14] for a detailed proof (by induction 
over the step-index n) of the theorem. 


5 Examples 


In this section, we demonstrate how our formalization of capability safety allows 
us to prove local-state encapsulation and control-flow correctness properties for 
challenging program examples. The security measures of Sect.3 are deployed to 
ensure these properties. Since we are dealing with assembly language, there are 
many details to the formal treatment, and therefore we necessarily omit some 
details in the lemma statements. The examples may look deceivingly short, but 
it is because they use the macro instructions described in Sect. 3. The examples 
would be unintelligible without the macros, as each macro expands to multiple 
basic instructions. The interested reader can find all the technical details in the 
technical appendix [14]. 
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f1: push 1 f2: malloc r; 1 
fetch rı adv store rı 1 
scall rı([],[ fetch rı adv 
pop rı call rı([, [r 
assert rı 1 assert rı 1 
halt halt 


Fig. 8. Two example programs that rely on local-state encapsulation. f1 uses our 
stack-based calling convention. £2 does not rely on a stack. 


5.1 Encapsulation of Local State 


f1 and f2 in Fig.8 demonstrate the capability machine’s encapsulation of local 
state. They are very similar: both store some local state, call an untrusted piece 
of code (adv), and then test whether the local state is unchanged. They differ 
in the way they do this. Program f1 uses our stack-based calling convention 
(captured by scall) to call the adversary, so it can use the available stack to 
store its local state. On the other hand, f2 uses malloc to allocate memory for 
its local state and uses an activation-record based calling convention (described 
in the technical appendix) to run the adversarial code. 

For both programs, we can prove that if they are linked with an adversary, 
adv, that is allowed to allocate memory but has no other capabilities, then the 
assertion will never fail during executing (see Lemmas 2 and 3 below). The two 
examples also illustrate the versatility of the logical relation. The logical relation 
is not specific to any calling convention, so we can use it to reason about both 
programs, even though they use different calling conventions. 

In order to formulate results about f1 and £2, we need a way to observe 
whether the assertion fails. To this end, we assume they have access to a flag (an 
address in memory). If the assertion fails, then the flag is set to 1 and execution 
halts. The correctness lemma for f1 then states: 


Lemma 2. Let 


d def 


Cady = ((E,global),...)  Csix = ((RWLX, local), ...) 
cp = ((RWX, global), ...) Cunk = ((RO, global), ...) 
Cmalloc = ((E, global), ae ) reg E Reg 


def 
M = MS f1 Y MS flag Y MS link Y MS aduy Y MS malloc Y MS stk Y MS frame 


where each of the capabilities have an appropriate range of authority and 
pointer?. Furthermore 


— MSfı Contains Clink, Cflag and the code of f1 

~ MS flag(flag) = 0 

— MSynk Contains Cady and Cmalloc 

— MSady contains Cyn, and otherwise only instructions. 


If (reg[pe = cpi][rstk > Cstn],m) —* (halted,m’), then m'(flag) = 0 


? These assumptions are kept intentionally vague for brevity. Full statements are in 
the technical appendix [14]. 
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To prove Lemma 2, it suffices to show that the start configuration is safe (in 
the O relation) for a world with a permanent region that requires the assertion 
flag to be 0. By an anti-reduction lemma, it suffices to show that the config- 
uration is safe after some reduction steps. We then use a general lemma for 
reasoning about scall, by which it suffices to show that (1) the configuration 
that scall will jump to is safe and (2) that the configuration just after scall is 
done cleaning up is safe. We use the Fundamental Theorem to reason about the 
unknown adversarial code, but notice that the adversary capability is an enter 
capability, which the Fundamental Theorem says nothing about. Luckily the 
enter capability becomes RX after the jump and then the Fundamental Theorem 
applies. 

We have a similar lemma for £2: 


Lemma 3. Making similar assumptions about capabilities and linking as in 
Lemma 2 but assuming no stack pointer, if (reg[pc +> cr2],m) —* (halted, m’), 
then m (flag) = 0. 


5.2 Well-Bracketed Control-Flow 


Using the stack-based calling convention of scall, we get well-bracketed control- 
flow. To illustrate this, we look at two example programs f3 and g1 in Fig. 9. 

In £3 there are two calls to an adversary and in order for the assertion in the 
middle to succeed, they need to be well-bracketed. If the adversary were able to 
store the return pointer from the first call and invoke it in the second call, then 
£3 would have 2 on top of its stack and the assertion would fail. However, the 
security measures in Sect. 3 prevent this attack: specifically, the return pointer 
is local, so it can only be stored on the stack, but the part of the stack that 
is accessible to the adversary is cleared before the second invocation. In fact, 
the following lemma shows that there are also no other attacks that can break 
well-bracketedness of this example, i.e. the assertion never fails. It is similar to 
the two previous lemmas: 


Lemma 4. Making similar assumptions about capabilities and linking as in 
Lemma 2 if (reg[pc = cral[Tstk > cstk] m) —* (halted,m’), then m'(flag) = 0. 


The final example, gi with f4, is a faithful translation of a tricky example 
known from the literature (known as the awkward example) [13,20]. It consists 
of two parts, gi and £4. gi is a closure generator that generates closures with 
one variable x set to 0 in its environment and f4 as the program (note we can 
omit some calling convention security measures because the stack is not used in 
the closure generator). £4 expects one argument, a callback. It sets z to 0 and 
calls the callback. When it returns, it sets x to 1 and calls the callback a second 
time. When it returns again, it asserts x is 1 and returns. This example is more 
complicated than the previous ones because it involves a closure invoked by the 
adversary and an adversary callback invoked by us. As explained in Sect. 3, this 
means that we need to check (1) that the stack pointer that the closure receives 
from the adversary has write-local permission and (2) that the adversary callback 
is global. 
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gi: malloc r2 1 (continued from previous column) £3: push 1 
store r2 0 store x 0 fetch rı adv 
move pc r3 scall r1(J],[ro, 71, renv]) scall ri(]],[ri]) 
lea r3 offset store x 1 pop r2 
crtcls [(z,r2)] r3 scall r1([],[ro, renv]) assert r2 1 
rclear RegName \ {pc,ro,ri} load rı x push 2 
jmp ro assert rı 1 scall rı([], [D 
f4: reqglob rı mclear Pst, halt 
prepstk Tstk rclear RegName \ {ro, pc} 
(continues in next column) jmp ro 


Fig. 9. Two programs that rely on well-bracketedness of scalls to function correctly. 
offset is the offset to £4. 


To illustrate how subtle this program is, consider how an adversary could try 
to make the assertion fail. In the second callback an adversary can get to the 
first callback by invoking the closure one more time. If there were any way for 
the adversary to transfer the return pointer from the point where it reinvokes 
the closure to where the closure reinvokes the callback, then the assertion could 
be made to fail. Similarly, if there were any way for the adversary to store a 
stack pointer or trick the trusted code into preserving it across an invocation, 
the assertion can likely be made to fail too. However, our calling convention 
prevents any of this from happening, as we prove in the following lemma. 


Lemma 5. Let 


Cady = ((RWX, global),...) cgi © ((E, global), ...) 
and otherwise make assumptions about capabilities and linking similar to 
Lemma 2. Then if (rego[pe > Cadu][Tstk 2 Cstk][T1 => Cgi],m) —* (halted, m’), 
then m (flag) = 0. 


As explained in Sect. 3, the macro-instruction reqglob rı checks that the call- 
back is global, essentially to make sure it is not allocated on the stack where 
it might contain old stack pointers or return pointers. Otherwise, the encapsu- 
lation of our local stack frame could be broken. In the proof of Lemma 5, this 
requirement shows up because we invoke the callback in a world that is only a 
private future world of the one where we received the callback, precisely because 
we have invalidated the adversary’s local state (particularly their old stack and 
return capabilities). The callback is still valid in this private future world, but 
only because we know that it is global. 

In Lemma 5 the order of control has been inverted compared to the previous 
lemmas. In this lemma, the adversary assumes control first with a capability 
for the closure creator g1. Consequently, we need to check that all arguments 
are safe to use and that we clean up before returning in the end. The inversion 
of control poses an interesting challenge when it comes to reasoning about the 
adversary’s local state during the execution of £4 and the callbacks where the 
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adversary should not rely on the local state from before the call of £4. This is 
easily done by revoking all the temporary regions of the world given at the start 
of £4. However, when f4 returns, the adversary is again allowed to rely on its 
old local state so we need to guarantee that the local state is unchanged. This 
is important because the return pointer that £4 receives may be local, and the 
adversary is allowed to allocate the activation record on the stack (just like we 
do) so they can store and recover their old stack pointer after £4 returns. By 
utilizing the reinstation mechanism of the future world relation as well as our 
knowledge of the future worlds used, we can construct a world in which the 
adversary’s invariants are preserved. The details of this and the proofs of the 
other lemmas are found in the technical appendix [14]. 


6 Discussion 


Calling Convention 


Formulating Control Flow Correctness. While we claim that our calling con- 
vention enforces control-flow correctness, we do not prove a general theorem 
that shows this, because it is not clear what such a theorem should look like. 
Formulations in terms of a control-flow graph, like the one by Abadi et al. [2], 
do not take into account temporal properties, like the well-bracketedness that 
Example g1 relies on. In fact, our examples show that our logical relation imply 
a stronger form of control-flow correctness than such formulations, although 
this is not made very explicit. As future work, we consider looking at a more 
explicit and useful way to formalize control-flow correctness. The idea would be 
to define a variant of our capability machine with call and return instructions 
and well-bracketed control flow built-in to the operational semantics, and then 
prove that compiling such code to our machine using our calling convention is 
fully abstract [21]. 


Performance and the Requirement for Stack Clearing. The additional security 
measures of the calling convention described in Sect. 3 impose an overhead com- 
pared to a calling convention without security guarantees. However, most of 
our security measures require only a few atomic checks or register clearings on 
boundary crossings between trusted code and adversary, which should produce 
an acceptable performance overhead. The only exception are the requirements 
for stack clearing that we have in two situations: when returning to the adver- 
sary and when invoking an adversary callback. As we have explained, we need 
to clear all of the stack that we are not using ourselves, not just the part that 
we have actually used. In other words, on every boundary cross between trusted 
code and adversary code, a potentially large region of memory must be cleared. 
We believe this is actually a common requirement for typical usage scenarios of 
local capabilities and capability machines like CHERI should consider to provide 
special support for this requirement, in the form of a highly-optimized instruc- 
tion for erasing a large block of memory. Nevertheless, from a discussion with the 
designers of the CHERI capability machine, we gather that it is not immediately 
clear whether and how such a primitive could be implemented efficiently in the 
CHERI context. 
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Modularity. It is important that our calling convention is modular, i.e. we do 
not assume that our code is specially privileged w.r.t. the adversary, and they 
can apply the same measures to protect themselves from us as we do to protect 
ourselves from them. More concretely, the requirements we have on callbacks 
and return pointers received from the adversary are also satisfied by callbacks 
and return pointers that we pass to them. For example, our return pointers are 
local capabilities because they must point to memory where we can store the old 
stack pointer, but the adversary’s return pointers are also allowed to be local. 
Adversary callbacks are required to be global but the callbacks we construct are 
allocated on the heap and also global. 


Arguments and Local Capabilities. Local capabilities are a central part of the 
calling convention as they are used to construct stack and return pointers. The 
use of local capabilities for the calling convention unfortunately limits the extent 
to which local capabilities can be used for other things. Say we are using the 
calling convention and receive a local capability other than the stack and return 
pointer, then we need to be careful if we want to use it because it may be an 
alias to the stack pointer. That is, if we first push something to the stack and 
then write to the local capability, then we may be (tricked into) overwriting our 
own local state. The logical relation helps by telling us what we need to ascertain 
or check in such scenarios to guarantee safety and preserve our invariants, but 
such checks may be costly and it is not clear to us whether there are practical 
scenarios where this might be realistic. 

We also need to be careful when we receive a capability from an adversary 
that we want to pass on to a different (instance of the) adversary. It turns out that 
the logical relation again tells us when this is safe. Namely, the logical relation 
says that we can only pass on safe arguments. For instance, when we receive a 
stack pointer from an adversary, then we may at some point want to pass on 
part of this stack pointer to, say, a callback. In order to do so, we need to make 
sure the stack pointer is safe which means that, if we have revoked temporary 
invariants, the stack must not directly or indirectly allow access to local values 
that we cannot guarantee safety of. When received from an adversary, we have 
to consider the contents of the stack unsafe, so before we pass it on, we have to 
clear it, or perform a dynamic safety analysis of the stack contents and anything 
it points to. Clearing everything is not always desirable and a dynamic safety 
analysis is hard to get right and potentially expensive. 

In summary, the use of local capabilities for other things than stack and 
return pointers is likely only possible in very specific scenarios when using our 
calling convention. While this is unfortunate, it is not unheard of that processors 
have built-in constructs that are exclusively used for handling control flow, such 
as, for example, the call and return instructions that exist in some instruction 
sets. 


Single Stack. A single stack is a good choice for the simple capability machine 
presented here, because it works well with higher-order functions. An alternative 
to a single stack would be to have a separate stack per component. The trouble 
with this approach is that, with multiple stacks and local stack pointers, it is 
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not clear how components would retrieve their stack pointer upon invocation 
without compromising safety. A safe approach could be to have stack pointers 
stored by a central, trusted stack management component, but it is not clear 
how that could scale to large numbers of separate components. Handling large 
numbers of components is a requirement if we want to use capability machines to 
enforce encapsulation of, for example, every object in an object-oriented program 
or every closure in a functional program. 


Logical Relation 


Single Orthogonal Closure. The definitions of € and V in Fig.6 apply a single 
orthogonal closure, a new variant of an existing pattern called biorthogonality. 
Biorthogonality is a pattern for defining logical relations [20,22] in terms of an 
observation relation of safe configurations (like we do). The idea is to define 
safe evaluation contexts as the set of contexts that produce safe observations 
when plugging safe values and define safe terms as the set of terms that can be 
plugged into safe evaluation contexts to produce safe observations. This is an 
alternative to more direct definitions where safe terms are defined as terms that 
evaluate to safe values. An advantage of biorthogonality is that it scales better 
to languages with control effects like call/cc. Our definitions can be seen as a 
variant of biorthogonality, where we take only a single orthogonal closure: we do 
not define safe evaluation contexts but immediately define safe terms as those 
that produce safe observations when plugged with safe values. This is natural 
because we model arbitrary assembly code that does not necessarily respect a 
particular calling convention: return pointers are in principle values like all others 
and there is no reason to treat them specially in the logical relation. 

Interestingly, Hur and Dreyer [23] also use a step-indexed, Kripke logical 
relation for an assembly language (for reasoning about correct compilation from 
ML to assembly), but because they only model non-adversarial code that treats 
return pointers according to a particular calling convention, they can use stan- 
dard biorthogonality rather than a single orthogonal closure like us. 


Public/Private Future Worlds. A novel aspect of our logical relation is how we 
model the temporary, revokable nature of local capabilities using public/private 
future worlds. The main insight is that this special nature generalizes that of 
the syntactically-enforced unstorable status of evaluation contexts in lambda 
calculi without control effects (of which well-bracketed control flow is a con- 
sequence). To reason about code that relies on this (particularly, the original 
awkward example), Dreyer et al. [13] (DNB) formally capture the special sta- 
tus of evaluation contexts using Kripke worlds with public and private future 
world relations. Essentially, they allow relatedness of evaluation contexts to be 
monotone with respect to a weaker future world relation (public) than related- 
ness of values, formalizing the idea that it is safe to make temporary internal 
state modifications (private world transitions, which invalidate the continuation, 
but not other values) while an expression is performing internal steps, as long 
as the code returns to a stable state (i.e. transitions to a public future world 
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of the original) before returning. We generalize this idea to reason about local 
capabilities: validity of local capabilities is allowed to be monotone with respect 
to a weaker future-world relation than other values, which we can exploit to 
distinguish between state changes that are always safe (public future worlds) 
and changes that are only valid if we clear all local capabilities (private future 
worlds). Our future world relations are similar to DNB’s (for example, our proof 
of the awkward example uses exactly the same state transition system), but they 
turn up in an entirely different place in the logical relation: rather than using 
public future worlds for the special syntactic category of evaluation contexts, 
they are used in the value relation depending on the locality of the capability at 
hand. Additionally, our worlds are a bit more complex because, to allow local 
memory capabilities and write-local capabilities, they can contain (revokable) 
temporary regions that are only monotonous w.r.t. public future worlds, while 
DNB’s worlds are entirely permanent. 


Local Capabilities in High-Level Languages. We point out that local capabilities 
are quite similar to a feature proposed for the high-level language Scala: Osvald 
et al. [24]’s second-class or local values. They are a kind of values that can be 
provided to other code for immediate use without allowing them to be stored in 
a closure or reference for later use. We believe reasoning about such values will 
require techniques similar to what we provide for local capabilities. 


7 Related Work 


Finally, we summarize how our work relates to previous work. We do not repeat 
the work we discussed in Sect. 6. 

Capability machines originate with Dennis and Van Horn [7] and we refer to 
Levy [25] and Watson et al. [9] for an overview of previous work. The capabil- 
ity machine formalized in Sect. 2 is a simple but representative model, modeled 
mainly after the M-Machine [6] (the enter pointers resemble the M-Machine’s) 
and CHERI [9,10] (the memory and local capabilities resemble CHERI’s). The 
latter is a recent and relatively mature capability machine, which combines 
capabilities with a virtual memory approach, in the interest of backwards com- 
patibility and gradual adoption. As discussed, our local capabilities can cross 
module boundaries, contrary to what is enforced by CHERI’s default CCall 
implementation. 

Plenty of other papers enforce well-bracketed control flow at a low level, but 
most are restricted to preventing particular types of attacks and enforce only 
partial correctness of control flow. This includes particularly the line of work 
on control-flow integrity [2]. Those use a quite different attacker model than us: 
they assume an attacker that is not able to execute code, but can overwrite 
arbitrary data at any time during execution (to model buffer overflows). By 
checking the address of every indirect jump and using memory access control to 
prevent overwriting code, this work enforces what they call control-flow integrity, 
formalized as the property that every jump will follow a legal path in the control- 
flow graph. As discussed in Sect.6, such a property ignores temporal properties 
and seems hard to use for reasoning. 
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More closely related to our work are papers that use a trusted stack manager 
and some form of memory isolation to enforce control-flow correctness as part of 
a secure compilation result [26,27]. Our work differs from theirs in that we use 
a different form of low-level security primitive (a capability machine with local 
capabilities rather than a machine with a primitive notion of compartments) and 
we do not use a trusted stack manager, but a decentralized calling convention 
based on local capabilities. Also, both prove a secure compilation result from a 
high-level language, which clearly implies a general form of control-flow correct- 
ness, while we define a logical relation that can be used to reason about specific 
programs that rely on well-bracketed control flow. 

Our logical relation is a unary, step-indexed Kripke logical relation with 
recursive worlds [16,18,20,28], closely related to the one used by Devriese et 
al. [11] to formulate capability safety in a high-level JavaScript-like lambda cal- 
culus. Our Fundamental Theorem is similar to theirs and expresses capability 
safety of the capability machine. Because we are not interested in externally 
observable side-effects (like console output or memory access traces), we do not 
require their notion of effect parametricity. Our logical relation uses several ideas 
from previous work, like Kripke worlds with regions containing state transition 
systems [15], public/private future worlds [13] (see Sect. 6 for a discussion), and 
biorthogonality [20,23,29]. 

Swasey et al. [30] have recently developed a logic, OCPL, for verification of 
object capability patterns. The logic is based on Iris [31-33], a state of the art 
higher-order concurrent separation logic and is formalized in Coq, building on 
the Iris Proof Mode for Coq [34]. OCPL gives a more abstract and modular way 
of proving capability safety for a lambda-calculus (with concurrency) compared 
to the earlier work by Devriese et al. [11]. 

El-Korashy also defined a formal model of a capability machine, namely 
CHERI, and uses it to prove a compartmentalization result [35] (not implying 
control-flow correctness). He also adapts control-flow integrity (see above) to the 
machine and shows soundness, seemingly without relying on capabilities. 
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Abstract. Many interesting program properties like determinism or 
information flow security are hyperproperties, that is, they relate mul- 
tiple executions of the same program. Hyperproperties can be verified 
using relational logics, but these logics require dedicated tool support 
and are difficult to automate. Alternatively, constructions such as self- 
composition represent multiple executions of a program by one product 
program, thereby reducing hyperproperties of the original program to 
trace properties of the product. However, existing constructions do not 
fully support procedure specifications, for instance, to derive the deter- 
minism of a caller from the determinism of a callee, making verification 
non-modular. 

We present modular product programs, a novel kind of product pro- 
gram that permits hyperproperties in procedure specifications and, thus, 
can reason about calls modularly. We demonstrate its expressiveness by 
applying it to information flow security with advanced features such as 
declassification and termination-sensitivity. Modular product programs 
can be verified using off-the-shelf verifiers; we have implemented our 
approach to secure information flow using the Viper verification infras- 
tructure. 


1 Introduction 


The past decades have seen significant progress in automated reasoning about 
program behavior. In the most common scenario, the goal is to prove trace 
properties of programs such as functional correctness or termination. However, 
important program properties such as information flow security, injectivity, and 
determinism cannot be expressed as properties of individual traces; these so- 
called hyperproperties relate different executions of the same program. For exam- 
ple, proving determinism of a program requires showing that any two executions 
from identical initial states will result in identical final states. 

An important attribute of reasoning techniques about programs is modular- 
ity. A technique is modular if it allows reasoning about parts of a program in 
isolation, e.g., verifying each procedure separately and using only the specifica- 
tions of other procedures. Modularity is vital for scalability and to verify libraries 
without knowing all of their clients. Fully modular reasoning about hyperprop- 
erties thus requires the ability to formulate relational specifications, which relate 
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different executions of a procedure, and to apply those specifications where the 
procedure is called. As an example, the statement 


if (x) then {y:=x} else {y:= call f(x)} 


can be proved to be deterministic if f’s relational specification guarantees that 
its result deterministically depends on its input. 

Relational program logics [11,27,29] allow directly proving general hyper- 
properties, however, automating relational logics is difficult and requires building 
dedicated tools. Alternatively, self-composition [9] and product programs [6,7] 
reduce a hyperproperty to an ordinary trace property, thus making it possible to 
use off-the-shelf program verifiers for proving hyperproperties. Both approaches 
construct a new program that combines the behaviors of multiple runs of the 
original program. However, by the nature of their construction, neither approach 
supports modular verification based on relational specifications: Procedure calls 
in the original program will be duplicated, which means that there is no sin- 
gle program point at which a relational specification can be applied. For the 
aforementioned example, self-composition yields the following program: 


if (x) then {y:=x} else {y:= call f(x)}; 
if (x’) then {y’:=x’} else {y’:= call f(x')} 


Determinism can now be verified by proving the trace property that identical 
values for x and x’ in the initial state imply identical values for y and y’ in the 
final state. However, such a proof cannot make use of a relational specification 
for procedure f (expressing that f is deterministic). Such a specification relates 
several executions of f, whereas each call in the self-composition belongs to a 
single execution. Instead, verification requires a precise functional specification 
of f, which exactly determines its result value in terms of the input. Verifying 
such precise functional specifications increases the verification effort and is at 
odds with data abstraction (for instance, a collection might not want to promise 
the exact iteration order); inferring them is beyond the state of the art for most 
procedures [28]. Existing product programs allow aligning or combining some 
statements and can thereby lift this requirement in some cases, but this requires 
manual effort during the construction, depends on the used specifications, and 
does not solve the problem in general. 

In this paper, we present modular product programs, a novel kind of prod- 
uct programs that allows modular reasoning about hyperproperties. Modular 
product programs enable proving k-safety hyperproperties, i.e., hyperproperties 
that relate finite prefixes of k execution traces, for arbitrary values of k [12]. We 
achieve this via a transformation that, unlike existing products, does not dupli- 
cate loops or procedure calls, meaning that for any loop or call in the original 
program, there is exactly one statement in the k-product at which a relational 
specification can be applied. Like existing product programs, modular products 
can be reasoned about using off-the-shelf program verifiers. 

We demonstrate the expressiveness of modular product programs by apply- 
ing them to prove secure information flow, a 2-safety hyperproperty. We show 
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how modular products enable proving traditional non-interference using natural 
and concise information flow specifications, and how to extend our approach for 
proving the absence of timing or termination channels, and supporting declassi- 
fication in an intuitive way. 

To summarize, we make the following contributions: 


— We introduce modular k-product programs, which enable modular proofs of 
arbitrary k-safety hyperproperties for sequential programs using off-the-shelf 
verifiers. 

— We demonstrate the usefulness of modular product programs by applying 
them to secure information flow, with support for declassification and pre- 
venting different kinds of side channels. 

— We implement our product-based approach for information flow verification 
in an automated verifier and show that our tool can automatically prove 
information flow security of challenging examples. 


After giving an informal overview of our approach in Sect. 2 and introducing 
our programming and assertion language in Sect. 3, we formally define modular 
product programs in Sect.4. We sketch a soundness proof in Sect. 5. Section 6 
demonstrates how to apply modular products for proving secure information 
flow. We describe and evaluate our implementation in Sect.7, discuss related 
work in Sect.8, and conclude in Sect. 9. 


2 Overview 


In this section, we will illustrate the core concepts behind modular k-products on 
an example program. We will first show how modular products are constructed, 
and subsequently demonstrate how they allow using relational specifications to 
modularly prove hyperproperties. 


2.1 Relational Specifications 


Consider the example program in Fig. 1, which counts the number of female 
entries in a sequence of people. Now assume we want to prove that the program 
is deterministic, i.e., that its output state is completely determined by its input 
arguments. This can be expressed as a 2-safety hyperproperty which states that, 
for two terminating executions of the program with identical inputs, the outputs 
will be the same. This hyperproperty can be expressed by the relational (as 
opposed to unary) specification main : people = = people a> count = count, where 
£ refers to the value of the variable x in the ith execution. 

Intuitively, it is possible to prove this specification by giving is_female a pre- 
cise functional specification like is_female : true ~> res = 1 — person mod 2, 
meaning that is_female can be invoked in any state and that res = 1 — person 
mod 2 will hold if it returns. From this specification and an appropriate loop 
invariant, main can be shown to be deterministic. However, this specification 
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procedure main( people) procedure is_female(person) 
returns (count) returns (res) 
{ { 
i i= 0; // gender encoded in first bit 
count := 0; gender := person mod 2; 
while (i < |people|) { if (gender = 0) { 
current := people[i]; res := 1; 
f := is_female(current ); Jelse{ 
count := count + f; res := 0; 
is= i + 1; 
} } 
} 


Fig. 1. Example program. The parameter people contains a sequence of integers that 
each encode attributes of a person; the main procedure counts the number of females 
in this sequence. 


is unnecessarily strong. For proving determinism, it is irrelevant what exactly 
the final value of count is; it is only important that it is uniquely determined 
by the procedure’s inputs. Proving hyperproperties using only unary specifica- 
tions, however, critically depends on having exact specifications for every value 
returned by a called procedure, as well as all heap locations modified by it. 
Not only are such specifications difficult to infer and cumbersome to provide 
manually; this requirement also fundamentally removes the option of underspec- 
ifying program behavior, which is often desirable in practice. Because of these 
limitations, verification techniques that require precise functional specifications 
for proving hyperproperties often do not work well in practice, as observed by 
Terauchi and Aiken for the case of self-composition [28]. 

Proving determinism of the example program becomes much simpler if we 
are able to reason about two program executions at once. If both runs start 
with identical values for people then they will have identical values for people, i, 
and count when they reach the loop. Since the loop guard only depends on i 
and people, it will either be true for both executions or false for both. Assuming 
that is_female behaves deterministically, all three variables will again be equal 
in both executions at the end of the loop body. This means that the program 
establishes and preserves the relational loop invariant that people, i, and count 
have identical values in both executions, from which we can deduce the desired 
relational postcondition. Our modular product programs enable this modular 
and intuitive reasoning, as we explain next. 


2.2 Modular Product Programs 


Like other product programs, our modular k-product programs multiply the 
state space of the original program by creating k renamed versions of all original 
variables. However, unlike other product programs, they do not duplicate control 
structures like loops or procedure calls, while still allowing different executions 
to take different paths through the program. 

Modular product programs achieve this as follows: The set of transitions 
made by the execution of a product is the union of the transitions made by 
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procedure main(pl, p2, peoplel, people2) procedure is_female(pl, p2, 
returns (countl, count2) personl , 

{ person2) 
if (pl) { il := 0; } returns (resl, res2) 
if (p2) { i2 := 0; } { 
if (pl) { count1l := 0; } if (pl) { 
if (p2) { count2 := 0; } genderl := personl mod 2; 
while ((pl && il < | peoplel |) || } 

(p2 && i2 < |people2|)) { if (p2) { 
11 := pl && il < | peoplel |; gender2 := person2 mod 2; 
: := p2 && i2 < | people2|; } 
f (11) { currentl := peoplel[il]; } tl := pl && genderl = 0; 
f (12) { current2 := people2[i2]; } t2 := p2 && gender2 == 0; 
if (11 || 12) { fl := pl && !(genderl == 0); 
tl, t2 := is_female(l1, 12, f2 := p2 && !(gender2 = 0); 
currentl , current2); if (t1) { resl := 1; } 
} if (t2) { res2 := 1; } 
if (11) { fL := tl; } if (f1) { resl := 0; } 
if (12) { f2 := t2; } if (f2) { res2 := 0; } 
if (11) { countl := count1 + fl; } } 
if (12) { count2 := count2 + f2; } 
if (11) { il c= il + 1; } 
if (12) { i2 := i2 + 1; } 


} 
} 


Fig. 2. Modular 2-product of the program in Fig. 1 (slightly simplified). Parameters 
and local variables have been duplicated, but control flow statements have not. All 
statements are parameterized by activation variables. 


the executions of the original program it represents. This means that if two 
executions of an if-then-else statement execute different branches, an execution 
of the product will execute the corresponding versions of both branches; however, 
it will be aware of the fact that each branch is taken by only one of the original 
executions, and the transformation of the statements inside each branch will 
ensure that the state of the other execution is not modified by executing it. 

For this purpose, modular product programs use boolean activation variables 
that store, for each execution, the condition under which it is currently active. All 
activation variables are initially true. For every statement that directly changes 
the program state, the product performs the state change for all active execu- 
tions. Control structures update which executions are active (for instance based 
on the loop condition) and pass this information down (into the branches of a 
conditional, the body of a loop, or the callee of a procedure call) to the level of 
atomic statements!. This representation avoids duplicating these control struc- 
tures. 

Figure 2 shows the modular 2-product of the program in Fig. 1. Consider 
first the main procedure. Its parameters have been duplicated, there are now 
two copies of all variables, one for each execution. This is analogous to self- 
composition or existing product programs. In addition, the transformed pro- 
cedure has two boolean parameters pl and p2; these variables are the initial 


1 The information stored in activation variables is similar to a path condition in sym- 
bolic execution, which is also updated every time a branch is taken. However, they 
differ for loops and calls. 
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activation variables of the procedure. Since main is the entry point of the pro- 
gram, the initial activation variables can be assumed to be true. 

Consider what happens when the product is run with arbitrary input values 
for peoplel and people2. The product will first initialize 11 and i2 to zero, like 
it does with i in the original program, and analogously for count1 and countz2. 

The loop in the original program has been transformed to a single loop in the 
product. Its condition is true if the original loop condition is true for any active 
execution. This means that the loop will iterate as long as at least one execution 
of the original program would. Inside the loop body, the fresh activation variables 
l1 and |2 represent whether the corresponding executions would execute the loop 
body. That is, for each execution, the respective activation variable will be true if 
the previous activation variable (p1 or p2, respectively) is true, meaning that this 
execution actually reaches the loop, and the loop guard is true for that execution. 
All statements in the loop body are then transformed using these new activation 
variables. Consequently, the loop will keep iterating while at least one execution 
executes the loop, but as soon as the loop guard is false for any execution, its 
activation variable will be false and the loop body will have no effect. 

Conceptually, procedure calls are handled very similarly to loops. For the call 
to is_female in the original program, only a single call is created in the product. 
This call is executed if at least one activation variable is true, i.e., if at least 
one execution would perform the call in the original program. In addition to 
the (duplicated) arguments of the original call, the current activation variables 
are passed to the called procedure. In the transformed version of is_female, all 
statements are then made conditional on those activation variables. Therefore, 
like with loops, a call in the product will be performed if at least one execution 
would perform it in the original program, but it will have no effect on the state 
of the executions that are not active when the call is made. 

The transformed version of is_female shows how conditionals are handled. We 
introduce four fresh activation variables t1, t2, f1, and f2, two for each execution. 
The first pair encodes whether the then-branch should be executed by either of 
the two executions; the second encodes the same for the else-branch. These acti- 
vation variables are then used to transform the branches. Consequently, neither 
branch will have an effect for inactive executions, and exactly one branch has 
an effect for each active execution. 

To summarize, our activation variables ensure that the sequence of state- 
changing statements executed by each execution is the same in the product and 
the original program. We achieve this without duplicating control structures or 
imposing restrictions on the control flow. 


2.3 Interpretation of Relational Specifications 


Since modular product programs do not duplicate calls, they provide a simple 
way of interpreting relational procedure specifications: If all executions call a 
procedure, its relational precondition is required to hold before the call and the 
relational postcondition afterwards. If a call is performed by some executions 
but not all, the relational specification are not meaningful, and thus cannot be 
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required to hold. To encode this intuition, we transform every relational pre- 
or postcondition Q of the original program into an implication (Any p) > Ô. 
In the transformed version, both pre- and postconditions are made conditional 
on the conjunction of all activation parameters p; of the procedure. As a result, 
both will be trivially true if at least one execution is not active at the call site. 

In our example, we give is-female the relational specification is-female 
true => person = person = res = rés, which expresses determinism. This speci- 
fication will be transformed into a unary specification of the product program: 
is_female : pl A p2 = true ~ p1 A p2 = (person1 = person2 => res1 = res2). 

Assume for the moment that is_female also has a unary precondition person > 
0. Such a specification should hold for every call, and therefore for every active 
execution, even if other executions are inactive. Therefore, its interpretation 
in the product program is (pl = personl > 0) A (p2 = person2 > 0). The 
translation of other unary assertions is analogous. 

Note that it is possible (and useful) to give a procedure both a relational and 
a unary specification; in the product this is encoded by simply conjoining the 
transformed versions of the unary and the relational assertions. 


2.4 Product Program Verification 


We can now prove determinism of our example using the product program. 
Verifying is_female is simple. For main, we want to prove the transformed spec- 
ification main : (pl A p2 = peoplel = people2) ~» (pl A p2 > countl = = count2). 
We use the relational loop invariant i = i A^ count = count A people = = people, 
encoded as pl A p2 > il = i2 A count1 = count2 ^ peoplel = people2. The loop 
invariant holds trivially if either p1 or p2 is false. Otherwise, it ensures l1 = 12 
and currentl = current2. Using the specification of is_female, we obtain t1 = t2, 
which implies that the loop invariant is preserved. The loop invariant implies 
the postcondition. 


3 Preliminaries 


We model our setting according to the relational logic by Banerjee, Naumann 
and Nikouei [5]? and, like them, use a standard Hoare logic [4] to reason about 
single program executions. Figure3 shows the language we use to define mod- 
ular product programs. x ranges over the set of local integer variable names 
VaR. Note that this language is deterministic; non-determinism can for exam- 
ple be modelled via additional inputs, as is often done for modelling fairness in 
concurrent programs [16]. Program configurations have the form (s,o), where 
ao € X maps variable names to values. The value of expression e in state ø is 


? Our handling of procedure calls is slightly different, but amounts to restricting proce- 
dures to work only on local variables not used in the rest of the program (as opposed 
to having a global state on which all procedures work directly), and only interacting 
with the rest of the program via explicitly declared return parameters. 
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(Programs) Prog ::= procedure main(T) returns (y){s}:: Nil | Proc :: Prog 

(Procedures) Proc ::= procedure m(Z) returns (y){s} 

(Statements) s ::= x:=e | s; s | if (e) then {s} else {s} | while (e) do {s} 
| z:= call m(é) 

(Expressions) e ::=c |x |e9e where c EZ and E€ {+,-,x,...} 

(Assertions) P:=PAP|P=>P|Va.Ple 

(RelExpressions) ér=cl|&|éGée 

(Rel Assertions) Ê := Ên Ê | Ê > Ê | vż,... 2. Ê | ê 

(MixAssertions) P:=P|P|PAP 


Fig. 3. Language. 


denoted as o(e). The small-step transition relation for program configurations 
has the form (s,0) — (s’,a’). A hypothesis context maps procedure names to 
specifications. 

The judgment E s : P ~ Q denotes that statement s, when executed 
in a state fulfilling the unary assertion P, will not fault, and if the execution 
terminates, the resulting state will fulfill the unary assertion Q. For an extensive 
discussion of the language and its operational and axiomatic semantics, see [5]. 

In addition to standard unary expressions and assertions, we define relational 
expressions and assertions. They differ from normal expressions and assertions 
in that they contain parameterized variable references of the form £ and are 
evaluated over a tuple of states instead of a single one. A relational expression 
is k-relational if for all contained variable references ¢, 1 <i < k, and analogous 
for relational assertions. The value of a variable reference & with 1 < i < k 


in a tuple of states (o1,...,0%) is o;(x); the evaluation of arbitrary relational 
expressions and the validity of relational assertions (o1,...,0,%) F P are defined 
accordingly. 


Definition 1. A k-relational specification s : Ê =>, 
k-relational assertions, and for all o1,...,0%,04,-+-,0 
Vi € {1,..., k}.(s,0;) —* (skip, o/), then (a4,...,0}) 


holds iff P and Q are 
if (o1,-.-,0%) E P and 
Q. 


We write s : Ê => Q for the most common case s : Ê %3 Â. 


TO 


4 Modular k-Product Programs 


In this section, we define the construction of modular products for arbitrary 
k. We will subsequently define the transformation of both relational and unary 
specifications to modular products. 


4.1 Product Construction 


Assume as given a function (VAR, N) — VAR that renames variables for different 
executions. We write e) for the renaming of expression e for execution i and 
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require that Yz, y, i, j.i #4 j > 2 4 y. We write fresh(x1,x2,...) to denote 
that the variable names 21,%2,... are fresh names that do not occur in the 
program and have not yet been used during the transformation. ê is used to 
abbreviate e),..., e(*). 

We denote the modular k-product of a statement s that is parameterized 
by the activation variables p™,...,p™) as [s]%. The product construction for 
procedures is defined as 


[procedure m(21,...,Um) returns (y1,...,Yn){s}]x 
= procedure m(p™,...,p"), args) returns (rets){[s]?} 

where 

args = xy), ee x), ee Em”, was Em 


rets = yw), ee yi, Sse Un), ihe Yn 

Figure4 shows the product construction rules for statements, which gen- 
eralize the transformation explained in Sect.2. We write if (e) then {s} asa 
shorthand for if (e) then {s} else {skip}, and Oi si for the sequential com- 
position of k statements s1;...; Sp- 

The core principle behind our encoding is that statements that directly 
change the state are duplicated for each execution and made conditional under 
the respective activation variables, whereas control statements are not dupli- 
cated and instead manipulate the activation variables to pass activation infor- 
mation to their sub-statements. This enables us to assert or assume relational 
assertions before and after any statement from the original program. The only 
state-changing statements in our language, variable assignments, are therefore 
transformed to a sequence of conditional assignments, one for each execution. 
Each assignment is executed only if the respective execution is currently active. 

Duplicating conditionals would also duplicate the calls and loops in their 
branches. To avoid that, modular products eliminate top-level conditionals; 
instead, new activation variables are created and assigned the values of the cur- 
rent activation variables conjoined with the guard for each branch. The branches 
are then sequentially executed based on their respective activation variables. 

A while loop is transformed to a single while loop in the product program 
that iterates as long as the loop guard is true for any active execution. Inside 
the loop, fresh activation variables indicate whether an execution reaches the 
loop and its loop condition is true. The loop body will then modify the state 
of an execution only if its activation variable is true. The resulting construct 
affects the program state in the same way as a self-composition of the original 
loop would, but the fact that our product contains only a single loop enables us 
to use relational loop invariants instead of full functional specifications. 

For procedure calls, it is crucial that the product contains a single call for 
every call in the original program, in order to be able to apply relational spec- 
ifications at the call site. As explained before, initial activation parameters are 
added to every procedure declaration, and all parameters are duplicated k times. 
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[s1; s2]® = [s:]%; [s2] 

[skip] = skip 

[x:=e]® a om if (p) then {a :=e} 
[if (e) then {s1} else {s2}]® = OL, (10 :=p Ae); 


O ts =p? and), 
[silks [se]? 
where 
fresh(pi) A fresh(p2) 
[while (e) do {s}]® = while (VE (p A e)) do { 
OL, 19 :=p re); 
oe 
} 
where 
fresh(p1) 
[x1,.°.,¢n:= call m(e1,...,em)]® = if (Vi, p) then { 
k A i m i i 
om if (p' )) then {OF (a! ):=e;®)}; 
ts:= call m(p™®,...,p™ , as); 


k : i n a i 
O if (p' )) then KONSCI Nets ))} 


} 

where 

fresh(d1,...,am) A fresh(ti,...,tn) 

as = [a ®,... P, am, amO] 


te= fta Pyesa ti ear tn pita” | 


Fig. 4. Construction rules for statement products. 


Procedure calls are therefore transformed such that the values of the current acti- 
vation variables are passed, and all arguments are passed once for each execution. 
The return values are stored in temporary variables and subsequently assigned 
to the actual target variables only for those executions that actually execute the 
call, so that for all other executions, the target variables are not affected. 

The transformation wraps the call in a conditional so that the call is per- 
formed only if at least one execution is active. This prevents the transformation 
from introducing infinite recursion that is not present in the original program. 

Note that for an inactive execution 7, arbitrary argument values are passed 
in procedure calls, since the passed variables aj © are not initialized. This is 
unproblematic because these values will not be used by the procedure. It is 
important to not evaluate ej (i) for inactive executions, since this could lead to 
false alarms for languages where expression evaluation can fail. 
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4.2 Transformation of Assertions 


We now define how to transform unary and relational assertions for use in a 
modular product. 

Unary assertions such as ordinary procedure preconditions describe state 
properties that should hold for every single execution. When checking or assum- 
ing that a unary assertion holds at a specific point in the program, we need to 
take into account that it only makes sense to do so for executions that actually 
reach that program point. We can express this by making the assertion con- 
ditional on the activation variable of the respective execution; as a result, any 
unary assertion is trivially valid for all inactive executions. 

A k-relational assertion, on the other hand, describes the relation between 
the states of all k executions. Checking or assuming a relational assertion at 
some point is meaningful only if all executions actually reach that point. This 
can be expressed by making relational assertions conditional on the conjunction 
of all current activation variables. If at least one execution does not reach the 
assertion, it holds trivially. 

We formalize this idea by defining a function a that maps relational asser- 
tions P to mary Set P of the product program such that a(P) = 

P(Var / Var] ... [Var / Var). Assertions can then be transformed for use in 
a k-product as follows: 


~— The transformation [PP of a k-relational assertion P with the activation 
variables p®,...,p™® is (Aia P) => a(P). 
— The Frais ET [P| of a unary assertion P is N (p > P®). 


Importantly, our approach allows using mized assertions and specifications, 
which represent conjunctions of unary and relational assertions. For example, it is 
common to combine a unary precondition that ensures that a procedure will not 
raise an error with a relational postcondition that states that it is deterministic. 

A mixed assertion R of the form P A Q means that the unary assertion P 
holds for every single execution, and if all executions are currently active, the 
relational assertion Q holds as well. The transformation of mixed assertions is 
straightforward: |R|? = |P|? A (Q|?. 


4.3 Heap-Manipulating Programs 


The approach outlined so far can easily be extended to programs that work on 
a mutable heap, assuming that object references are opaque, i.e., they cannot 
be inspected or used in arithmetic. In order to create a distinct state space for 
each execution represented in the product, allocation statements are duplicated 
and made conditional like assignments, and therefore create a different object 
for each active execution. The renaming of a field dereference e.f is then defined 
as ef. As a result, the heap of a k-product will consist of k partitions that 
do not contain references to each other, and execution 7 will only ever interact 
with objects from its partition of the heap. 
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The verification of modular products of heap-manipulating programs does 
not depend on any specific way of achieving framing. Our implementation is 
based on implicit dynamic frames [25], but other approaches are feasible as well, 
provided that procedures can be specified in such a way that the caller knows 
the heap stays unmodified for all executions whose activation variables are false. 

Since the handling of the heap is largely orthogonal to our main technique, 
we will not go into further detail here, but we do support heap-manipulating 
programs in our implementation. 


5 Soundness and Completeness 


A product construction is sound if an execution of a k-product mirrors k sep- 
arate executions of the original program such that properties proved about the 
product entail hyperproperties of the original program. In this section, we sketch 
a soundness proof of our k-product construction in the presence of only unary 
procedure specifications. We also sketch a proof for relational specifications for 
the case k = 2, making use of the relational logic presented by Banerjee et al. [5]. 
Finally, we informally discuss the completeness of modular products. 


5.1 Soundness with Unary Specifications 


A modular k-product must soundly encode k executions of the original program. 
That is, if an encoded unary specification holds for a product program then the 
original specification holds for the original program. 

We define a relation o œ~; o’ that denotes that ø contains a renamed version 
of all variables in ø’, i.e., Vu € dom(a’) : o(v) = ø' (v). Without the index i, ~ 
denotes the same but without renaming, and is used to express equality modulo 
newly introduced activation variables. 


Theorem 1. Assume that for all procedures m in a hypothesis context ® we 
have that m : S ~ T € dom(®) if and only if m : |S]} ~ |T] € dom(@’). 
Then P E [s]? : |P]} ~ |Q\? implies that BE s: P ~ Q. 


Proof (Sketch). We sketch a proof based on the operational semantics of our 

language. We show that the execution of the product program with exactly one 

active execution corresponds to a single execution of the original program. 
Assume that P F [s]? : PJ} ~ [QJ|?, and that o F [PJ?. If [s]? does not 


* 


diverge when executed from o we have that ([s]®, o) —* (skip,o’) and o’ 
LQ|2. We now prove that a run of the product with all but one execution being 
inactive reflects the states that occur in a run of the original program. Assume 
that g E PO AAT, (=p) and (s,01) —>* (skip, c1) and initially o ~; o1, which 
implies ø F P. We prove by induction on the derivation of (s,01) —* (skip, o1) 
that ([s]®, ao) —* (skip,o’) and o’ ~, oj, meaning that the product execution 


terminates, and subsequently by induction on the derivation of ([s]®,) * 


(skip, o’) that o’ ~; o}, from which we can derive that o| F Q. 
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5.2 Soundness for Relational Specifications 


The main advantage of modular product programs over other kinds of product 
programs is that it allows reasoning about procedure calls in terms of relational 
specifications. We therefore need to show the soundness of our approach in the 
presence of procedures with such specifications. In particular, we must establish 
that if a transformed relational specification holds for a modular product then 
the original relational specification will hold for a set of k executions of the 
original program. 

Our proof sketch is phrased in terms of biprograms as introduced by Banerjee 
et al. [5]. Biprogram executions correspond to two partly aligned executions of 
their two underlying programs. A biprogram ss can have the form (sı|s2) or 
||s||; the former represents the two executions of sı and s2, whereas the latter 
represents an aligned execution of s by both executions, which enables using 
relational specifications for procedure calls*. We denote the small-step transition 
relation between biprogram configurations as (ss,01|02) >* (ss', oilo). We 
make use of a relation o © o1|02 that denotes that o contains renamed versions 
of all variables in both cı and øz with the same values. 

Biprograms do not allow mixed procedure specifications, meaning that a 
procedure can either have only a unary specification, or it can have only a 
relational specification, in which case it can only be invoked by both executions 
simultaneously. As mentioned before, our approach does not have this limitation, 
but we can artificially enforce it for the purposes of the soundness proof. 

We can now state our theorem. Since biprograms represent the execution of 
two programs, we formulate soundness for k = 2 here. 


Theorem 2. Assume that hypothesis context B maps procedure names to rela- 
tional specifications if all calls to the procedure in s can be aligned from any pair 
of states satisfying P, and to unary specifications otherwise. Assume further that 
hypothesis context &' maps the same procedure names to their transformed spec- 
ifications. Finally, assume that P + [s]? : : [ÊJÈ ~ [Ô] and (01,02) F E Ê. If 


(3,01) > E (skip, 01) and (8,02) > E (skip, 04), then (01,09) F 5 Q. 


Proof (Sketch). The proof follows the same basic outline as the one for Theorem 1 
but reasons about the operational semantics of biprograms representing two 
executions of s. . 

Assume that + [s]? : (ÊJ? ~ [Q|? and o F |P]Ë. If [s] does not diverge 
when executed from ø we get that ([s]8 ,0) —* (skip, o’) and o’ E F (QIE. Assume 
that initially o © o,|o2, which implies that (01,02) F P. We prove by induction 
on the derivation of ([s]?,0) —* (skip, o’) that (1) if o E p™ A p®), then there 
exists ss that represents two executions of s s.t. (88,01|02) S* (||skip]|,o4|o4) 
and o! © a} |o}; (2) if o E p® A ap), then (s,o1) —>* (skip, c1) and o! & 
o'|o9; (3) if o FE ap A p®), then (s,o2) >* (skip, oh) and o! © ciob; (4) if 
o E ap) A ap), then o œ ø’. From the first point and semantic consistency 


3 We modified the original notation to avoid clashes with our own concepts introduced 
earlier. 
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of the relational logic, we can conclude that (04,05) E Q. Finally, we prove that 
([s]8 o) —* (skip, o’) by showing that non-termination of the product implies 
the non-termination of at least one of the two original program runs. If the 
condition of a loop in the product remains true forever, the loop condition of at 
least one encoded execution must be true after every iteration. We show that 
(1) this is not due to an interaction of multiple executions, since the condition 
for every execution will remain false if it becomes false once, and (2) since the 
encoded states of active executions progress as they do in the original program, 
the condition of a single execution in the product remains true forever only if it 
does in the original program. A similar argument shows that the product cannot 
diverge because of infinite recursive calls. 


5.3. Completeness 


We believe modular product programs to be complete, meaning that any hyper- 
property of multiple executions of a program can be proved about its modular 
product program. Since the product faithfully models the executions of the orig- 
inal program, the completeness of modular products is potentially limited only 
by the underlying verification logic and the assertion language, but not by the 
product construction itself. 


6 Modular Verification of Secure Information Flow 


In this section, we demonstrate the expressiveness of modular product programs 
by showing how they can be used to verify an important hyperproperty, informa- 
tion flow security. We first concentrate on secure information flow in the classical 
sense [9], and later demonstrate how the ability to check relational assertions at 
any point in the program can be exploited to prove advanced properties like the 
absence of timing and termination channels, and to encode declassification. 


6.1 Non-interference 


Secure information flow, i.e., the property that secret information is not leaked 
to the public outputs of a program, can be expressed as a relational 2-safety 
property of a program called non-interference. Non-interference states that, if 
a program is run twice, with the public (often called low) inputs being equal 
in both runs but the secret (or high) inputs possibly being different, the public 
outputs of the program must be equal in both runs [8]. This property guarantees 
that the high inputs do not influence the low outputs. 
We can formalize non-interference as follows: 


Definition 2. A statement s that operates on a set of variables X = 
{x1,...,2@n}, of which some subset Xı C X is low, satisfies non-interference 
iff for all o1,02 and 01,03, if Vx € X1.01(x) = c2(x) and (s,01) —>* (skip, o1) 
and (8,02) >* (skip,a4) then Va € X}.04(x) = 04(2). 
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Since our definition of non-interference describes a hyperproperty, we can 
verify it using modular product programs: 


Theorem 3. A statement s that operates on a set of variables X = 
{x1,...,2n}, of which some subset Xı C X is low, satisfies non-interference 
under a unary precondition P if 6+ [s]} : |P|S A (Va € X).2% =a?) ~ Va € 
X,. 2 = x) 


Proof (Sketch). Since non-interference can be expressed using a 2-relational spec- 
ification, the theorem follows directly from Theorem 2. 


For non-deterministic programs whose behavior can be modelled by adding 
input parameters representing the non-deterministic choices, those parameters 
can be considered low if the choice is not influenced in any way by secret data. 

An expanded notion of secure information flow considers observable events 
in addition to regular program outputs [17]. An event is a statement that has an 
effect that is visible to an outside observer, but may not necessarily affect the 
program state. The most important examples of events are output operations like 
printing a string to the console or sending a message over a network. Programs 
that cause events can be considered information flow secure only if the sequence 
of produced events is not influenced by high data. One way to verify this using 
our approach is to track the sequence of produced events in a ghost variable 
and verify that its value never depends on high data. This approach requires 
substantial amounts of additional specifications. 

Modular product programs offer an alternative approach for preventing leaks 
via events, since they allow formulating assertions about the relation between the 
activation variables of different executions. In particular, if a given event has the 
precondition that all activation variables are equal when the event statement is 
reached then this event will either be executed by both executions or be skipped 
by both executions. As a result, the sequence of events produced by a program 
will be equal in all executions. 


6.2 Information Flow Specifications 


The relational specifications required for modularly proving non-interference 
with the previously described approach have a specific pattern: they can contain 
functional specifications meant to be valid for both executions (e.g., to make 
sure both executions run without errors), they may require that some informa- 
tion is low, which is equivalent to the two renamings of the same expression 
being equal, and, in addition, they may assert that the control flow at a specific 
program point is low. 

We therefore introduce modular information flow specifications, which can 
express all properties required for proving secure information flow but are trans- 
parent w.r.t. the encoding or the verification methodology, i.e., they allow 
expressing that a given operation or value must not be secret without knowl- 
edge of the encoding of this fact into an assertion about two different program 
executions. We define information flow specifications as follows: 
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(SIF Assertions) P ::= PA P | e | lou(e) | lowEvent | P > P | Vx. P 


low(e) and lowEvent may be used on the left side of an implication only if the 
right side has the same form. low(e) specifies that the value of the expression e is 
not influenced by high data. Note that e can be any expression and is not limited 
to variable references; this reflects the fact that our approach can label secrecy in 
a more fine-grained way than, e.g., a type system. One can, for example, declare 
to be public whether a number is odd while keeping its value secret. 


je? = WO > e®) A(p® > eO) 
[loule]? = (PO Ap® = el =) 
[lowEvent]? = = p™ = = p®) 

[Ē aB]? = [BT a [B 

[A > Pa? = [PJP = [A 

[Vz. P]? = Va 2 2 = r” > [P]? 


Fig. 5. Translation of information flow specifications. 


lowEvent specifies that high data must not influence if and how often the cur- 
rent program point is reached by an execution, which is a sufficient precondition 
of any statement that causes an observable event. In particular, if a procedure 
outputs an expression e, the precondition lowEvent ^A low(e) guarantees that no 
high information will be leaked via this procedure. 

Information flow specifications can express complex properties. e1 = low(e2), 
for example, expresses that if e; is true, eg must not depend on high data; 
e1 > lowEvent says the same about the current control flow. A possible use case 
for these assertions is the precondition of a library function that prints e> to a 
low-observable channel if eı is true, and to a secure channel otherwise. 

The encoding [Fy of an information flow assertion P under the activation 
variables p“ and p?) is defined in Fig.5. Note that high-ness of some expres- 
sion is not modelled > its renamings being definitely unequal, but by leaving 
underspecified whether they are equal or not, meaning that high-ness is simply 
the absence of the knowledge of low-ness. As a result, it is never necessary to 
specify explicitly that an expression is high. This approach (which is also used 
in self-composition) is analogous to the way type systems encode security levels, 
where low is typically a subtype of high. For the example in Fig. 1, a possible, 
very precise information flow specification could say that the results of main are 
low if the first bit of all entries in people is low. We can write this as main : 
low(|people|) A Vi € {0,...,|people| — 1}. low(people[i] mod 2) ~ low(count). In 
the product, this will be translated to main : p1Ap2 = |people1| = |people2|AVi € 
{0,...,|people1| — 1}. (people1[i] mod 2) = (people2[i] mod 2) = countl = 
count2. 

In this scenario, the loop in main could have the simple invariant low(i) A 
low(count), and the procedure is_female could have the contract is_female 
true ~~ (low(person mod 2) = low(res)). This contract follows a useful pattern 
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procedure check(password, input) 
returns (result) 


result := |password| == |input |; 

ERA Os 

while (i < min(|password|, |input|) { 
result := result && password[i] == input[i]; 
is= i+1; 

} 


} 


Fig. 6. Password check example: leaking secret data is desired. 


where, instead of requiring an input to be low and promising that an output will 
be low for all calls, the output is decribed as conditionally low based on the level 
of the input, which is more permissive for callers. 

The example shows that the information relevant for proving secure informa- 
tion flow can be expressed concisely, without requiring any knowledge about the 
methodology used for verification. Modular product programs therefore enable 
the verification of the information flow security of main based solely on modular, 
relational specifications, and without depending on functional specifications. 


6.3 Secure Information Flow with Arbitrary Security Lattices 


The definition of secure information flow used in Definition 2 is a special case 
in which there are exactly two possible classifications of data, high and low. In 
the more general case, classifications come from an arbitrary lattice (£,C) of 
security levels s.t. for some 1J,,l2 € L, information from an input with level lı 
may influence an output with level l2 only if lı E l2. Instead of the specification 
low(e), information flow assertions can therefore have the form level Below(e, l), 
meaning that the security level of expression e is at most l. 

It is well-known that techniques for verifying information flow security with 
two levels can conceptually be used to verify programs with arbitrary finite 
security lattices [23] by splitting the verification task into |£] different verifica- 
tion tasks, one for each element of £. Instead, we propose to combine all these 
verification tasks into a single task by using a symbolic value for l, i.e., declar- 
ing an unconstrained global constant representing l. Specifications can then be 
translated as follows: 


levelBelow(e,l’) = V Cl > eM = e®) 


Since no information about l is known, verification will only succeed if all 
assertions can be proven for all possible values of l, which is equivalent to proving 
them separately for each possible value of l. 


6.4 Declassification 


In practice, non-interference is too strong a property for many use cases. Often, 
some leakage of secret data is required for a program to work correctly. Consider 
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procedure main(h: Int) procedure main(h: Int) 
{ 
while (h != 0) { i := 0; 
h := h- 1; while (i < h) { 
} i := i +1 
} 
print (0) 


Fig. 7. Programs with a termination channel (left), and a timing channel (right). In 
both cases, h is high. 


the case of a password check (see Fig. 6): A secret internal password is compared 
to a non-secret user input. While the password itself must not be leaked, the 
information whether the user input matches the password should influence the 
public outcome of the program, which is forbidden by non-interference. 

To incorporate this intention, the relevant part of the secret information 
can be declassified [24], e.g., via a declassification statement declassify e that 
declares an arbitrary expression e to be low. With modular products, declassifi- 
cation can be encoded via a simple assumption stating that, if the declassification 
is executed in both executions, the expression is equal in both executions: 


[declassify ce]? = assume (p\ A p?)) > e® = e 


Introducing an assumption of this form is sound if the information flow 
specifications from Sect.6.2 are used to specify the program. Since high-ness 
is encoded as the absence of the knowledge that an expression is equal in both 
executions, not by the knowledge that they are different, there is no danger that 
assuming equality will contradict current knowledge and thereby cause unsound- 
ness. As in the information flow specifications, the declassified expression can be 
arbitrarily complex, so that it is for example possible to declassify the sign of an 
integer while keeping all other information about it secret. 

The example in Fig.6 becomes valid if we add declassify result at the 
end of the procedure, or if we declassify a more complex expression by adding 
declassify equal (password, input) at some earlier point. The latter would 
arguably be safer because it specifies exactly the information that is intended to 
be leaked, and would therefore prevent accidentally leaking more if the imple- 
mentation of the checking loop was faulty. 

This kind of declassification has the following interesting properties: First, it 
is imperative, meaning that the declassified information may be leaked (e.g., via 
a print statement) after the execution of the declassification statement, but not 
before. Second, it is semantic, meaning that the declassification affects the value 
of the declassified expression as opposed to, e.g., syntactically the declassified 
variable. As a result, it will be allowed to leak any expression whose value con- 
tains the same (or a part of the) secret information which was declassified, e.g., 
the expression f(e) if f is a deterministic function and e has been declassified. 
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6.5 Preventing Termination Channels 


In Definition 2, we have considered only terminating program executions. In 
practice, however, termination is a possible side-channel that can leak secret 
information to an outside observer. Figure 7 (left) shows an example of a program 
that verifies under the methodology presented so far, but leaks information about 
the secret input h to an observer: If h is initially negative, the program will 
enter an endless loop. Anyone who can observe the termination behavior of the 
program can therefore conclude if h was negative or not. 

To prevent leaking information via a termination side channel, it is necessary 
to verify that the termination of a program depends only on public data. We will 
show that modular product programs are expressive enough to encode and check 
this property. We will focus on preventing non-termination caused by infinite 
loops here; preventing infinite recursion works analogously. In particular, we want 
to prove that if a loop iterates forever in one execution, any other execution with 
the same low inputs will also reach this loop and iterate forever. More precisely, 
this means that 


(A) if a loop does not terminate, then whether or not an execution reaches that 
loop must not depend on high data. 

(B) whether a loop that is reached by both executions terminates must not 
depend on high data. 


We propose to verify these properties by requiring additional specifications 
that state, for every loop, an exact condition under which it terminates. This 
condition may neither over- nor underapproximate the termination behavior; 
the loop must terminate if and only if the condition is true. For Fig. 7 (left) the 
condition is h > 0. We also require a ranking function for the cases when the 
termination condition is true. We can then prove the following: 


(a) If the termination condition of a loop evaluates to false, then any two exe- 
cutions with identical low inputs either both reach the loop or both do 
not reach the loop (i.e., reaching the loop is a low event). This guarantees 
property (A) above. 

(b) For loops executed by both executions, the loop’s termination condition is 
low. This guarantees property (B) under the assumption that the termina- 
tion condition is exact. 

(c) The termination condition is sound, i.e., every loop terminates if its termi- 
nation condition is true. We prove this by showing that if the termination 
condition is true, we can prove the termination of the loop using the supplied 
ranking function. 

(d) The termination condition is complete, i.e., every loop terminates only if its 
termination condition is true. We prove this by showing that if the condition 
is false, the loop condition will always remain true. This check, along with the 
previous proof obligation, ensures that the termination condition is exact. 

(e) Every statement in a loop body terminates if the loop’s termination con- 
dition is true, i.e., the loop’s termination condition implies the termination 
conditions of all statements in its body. 
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term(w,c) = cond:=e-; 


assert =ne. > lowEvent; // checks (a) 
assert low(ec); // checks (b) 
assert ec => €r > 0; // checks (c) 
assert C > €c; // checks (e) 
while (e) 

invariant —cond => e // checks (d) 
do { 


if (cond) then {rank:=e;,}; 

term(s, cond); 

if (cond) then { // checks (c) 
assert 0 < e, ^er < rank 

} 


} 


Fig. 8. Program instrumentation for termination leak prevention. We abbreviate 
while (e) terminates(e.,e,) do {s} as w. 


We introduce an annotated while loop while (e) terminates(e,,e,) do {s}, 
where ee is the exact termination condition and epy is the ranking function, i.e., 
an integer expression whose value decreases with every loop iteration but never 
becomes negative if the termination condition is true. Based on these annota- 
tions, we present a program instrumentation term (s,c) that inserts the checks 
outlined above for every while loop in s. c is the termination condition of the 
outside scope, i.e., for the instrumentation of a nested loop, it is the termina- 
tion condition ee of the outer loop. The instrumentation is defined for annotated 
while loops in Fig.8; for all other statements, it does not make any changes 
except instrumenting all substatements. The instrumentation uses information 
flow assertions as defined in Sect. 6.2. Again, we make use of the fact that mod- 
ular products allow checking relational assertions at arbitrary program points 
and formulating assertions about the control flow. 

We now prove that if an instrumented statement verifies under some 2- 
relational precondition then any two runs from a pair of states fulfilling that 
precondition will either both terminate or both loop forever. 


Theorem 4. If s = term(s, false), and [s’ A verifies under some precondition 


* 


P= [PJĖ, and for some 01,02,01, (01,02) F Ê and (s,o1) >* (skip, o1), then 


there exists some oh s.t. (8,02) —>* (skip, oh). 


Proof (Sketch). We first establish that our instrumentation ensures that each 
statement terminates (1) if and (2) only if its termination condition is true, (1) 
by showing equivalence to a standard termination proof, and (2) by a contra- 
diction if a loop which should not terminate does. Since the execution from g1 
terminates, by the second condition, its termination condition must have been 
true before the loop. We case split on whether the other execution also reaches 
the loop or not. If it does then the termination condition before the loop is 
identical in both executions, so by the first condition, the other execution also 
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terminates. If it does not then the loop is not executed at all by the other exe- 
cution, and therefore cannot cause non-termination. 


6.6 Preventing Timing Channels 


A program has a timing channel if high input data influences the program’s 
execution time, meaning that an attacker who can observe the time the program 
executes can gain information about those secrets. Timing channels can occur 
in combination with observable events; the time at which an event occurs may 
depend on a secret even if the overall execution time of a program does not. 

Consider the example in Fig.7 (right). Assuming main receives a positive 
secret h, both the print statement and the end of the program execution will be 
reached later for larger values of h. 

Using modular product programs, we can verify the absence of timing side 
channels by adding ghost state to the program that tracks the time passed since 
the program has started; this could, for example, be achieved via a simple step 
counting mechanism, or by tracking the sequence of previously executed bytecode 
statements. This ghost state is updated separately for both executions. We can 
then assert anywhere in the program that the passed time does not depend 
on high data in the same way we do for program variables. In particular, we 
can enforce that the passed time is equal whenever an observable event occurs, 
and we can enable users to write relational specifications that compare the time 
passed in both executions of a loop or a procedure. 


7 Implementation and Evaluation 


We have implemented our approach for secure information flow in the Viper ver- 
ification infrastructure [22] and applied it to a number of example programs from 
the literature. Both the implementation and examples are available at http:// 
viper.ethz.ch/modularproducts/. 


7.1 Implementation in Viper 


Our implementation supports a version of the Viper language that adds the 
following features: 


1. The assertions low(e) and lowEvent for information flow specifications 

2. A declassify statement 

3. Variations of the existing method declarations and while loops that include 
the termination annotations shown in Sect. 6.5 


The implementation transforms a program in this extended language into a 
modular 2-product in the original language, which can then be verified by the 
(unmodified) Viper back-end verifiers. All specifications are provided as infor- 
mation flow specifications (see Sect. 6.2) such that users require no knowledge 
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about the transformation or the methodology behind information flow verifica- 
tion. Error messages are automatically translated back to the original program. 

Declassification is implemented as described in Sect. 6.4. Our implementation 
optionally verifies the absence of timing channels; the metric chosen for tracking 
execution time is simple step-counting. Viper uses implicit dynamic frames [25] to 
reason about heap-manipulating programs; our implementation uses quantified 
permissions [21] to support unbounded heap data structures. 

For languages with opaque object references, secure information flow can 
require that pointers are low, i.e., equal up to a consistent renaming of addresses. 
Therefore, our approach to duplicating the heap state space in the implementa- 
tion differs from that described in Sect. 4.3: Instead of duplicating objects, our 
implementation creates a single new statement for every new in the original pro- 
gram, but duplicates the fields each object has. As a result, if both executions 
execute the same new statement, the newly created object will be considered low 
afterwards (but the values of its fields might still be high). 


7.2 Qualitative Evaluation 


We have evaluated our implementation by verifying a number of examples in the 
extended Viper language. The examples are listed in Table 1 and include all code 
snippets shown in this paper as well as a number of examples from the litera- 
ture [2,3,6,13, 14,17, 18, 23,26, 28]. They combine complex language features like 
mutable state on the heap, arrays and procedure calls, as well as timing and ter- 
mination channels, declassification, and non-trivial information flows (e.g., flows 
whose legality depends on semantic information not available in a standard infor- 
mation flow type system). We manually added pre- and postconditions as well 
as loop invariants; for those that have forbidden flows and therefore should not 
verify, we also added a legal version that declassifies the leaked information. Our 
implementation returns the correct result for all examples. 

In all cases but one, our approach allows us to express all information flow 
related assertions, i.e., procedure specifications and loop invariants, purely as 
relational specifications in terms of low-assertions (see Table 1). For all these 
examples, we completely avoid the need to specify the functional behavior of the 
program. Unlike the original product program paper [6], we also do not inline 
any procedure calls; verification is completely modular. 

The only exception is an example that, depending on a high input, executes 
different loops with identical behavior, and for which we need to prove that 
the execution time is low. In this case we have to provide invariants for both 
loops that exactly specify their execution time in order to prove that the overall 
execution time after the conditional is low. Nevertheless, the specification of the 
procedure containing the loop is again expressed with a relational specification 
using only low. For all other examples, unary specifications were only needed to 
verify the absence of runtime errors (e.g., out-of-bounds array accesses), which 
Viper verifies by default. Consequently, a verified program cannot leak low data 
through such errors, which is typically not guaranteed by type systems or static 
analyses. 


524 M. Eilers et al. 


File Event | Heap | Array | Decl. | Term. | Time | Call | LOC | Ann/SF/NI/TM/F | Tyca | TsE 
antopolous1 [2] x 25 7/3/3/0/2 0.78} 1.10 
antopolous2 [2] x x 61 14/0/14/0/0 0.72} 0.91 
banerjee [3] x x x 76 17/11/6/0/0 1.02} 0.61 
constanzo [13] x x 22 7/2/5/0/0 0.67} 0.28 
darvas [14] x x 33 12/8/4/0/0 0.67} 0.35 
example x x 31 7/1/6/0/0 0.73] 0.59 
example_decl x x 19 5/2/3/0/0 0.72) 0.77 
example_term x x 31 8/4/2/2/0 0.77 | 0.43 
example_time x x x x 32 9/0/9/0/0 0.70} 0.38 
joana_1-_tl [17] x x x 28 1/0/1/0/0 0.62} 0.23 
joana_2_bl [17] x x 18 2/0/2/0/0 0.63} 0.25 
joana_2_t [17] x 15 1/0/1/0/0 0.62} 0.20 
joana_3_bl [17] x x x x 47 5/1/2/2/0 D.TT | 0.47 
joana_3_br [17] x £ x x 43 8/0/2/6/0 0.83 | 0.60 
joana_3_tl [17] x x x 33 8/2/2/4/0 0.75} 0.53 
joana_3_tr [17] x x x x 35 8/4/2/2/0 0.76} 0.51 
joana_13_1 [17] x | 12 1/0/1/0/0 0.62| 0.24 
kusters [18 x x 29 9/6/3/0/0 0.64| 0.44 
naumann [23] x x 20 6/3/6/0/0 0.81} 0.88 
product [6] x x x 65 30/21/21/0/0 5.47 | 15.73 
smith [26] x x 43 12/6/8/0/0 0.87| 0.89 
terauchil [28] 14 2/0/2/0/0 0.62} 0.26 
terauchi2 [28] x x 21 4/0/4/0/0 0.63 | 0.30 
terauchi3 [28] 24 5/1/4/0/0 0.66 | 0.40 


Table 1. Evaluated examples. We show the used language features, lines of code includ- 
ing specifications, overall lines used for specifications (Ann), unary specifications for 
safety (SF), relational specifications for non-interference (NI), specifications for ter- 
mination (TM), and functional specifications required for non-interference (F). Note 
that some lines contain specifications belonging to multiple categories. Columns Tsg 
and Tyce show the running times of the verifiers for the SE backend and the VCG 
backend, respectively, in seconds. 


7.3 Performance 


For all but one example, the runtime (averaged over 10 runs on a Lenovo 
ThinkPad T450s running Ubuntu) with both the Symbolic Execution (SE) and 
the Verification Condition Generation (VCG) verifiers is under or around one 
second (see Table 1). The one exception, which makes extensive use of unbounded 
heap data structures, takes ca. five seconds when verified using VCG, and 15in 
the SE verifier. This is likely a result of inefficiencies in our encoding: The created 
product has a high number of branching statements, and some properties have 
to be proved more than once, two issues which have a much larger performance 
impact for SE than for VCG. We believe that it is feasible to remove much of 
this overhead by optimizing the encoding; we leave this as future work. 
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8 Related Work 


The notion of k-safety hyperproperties was originally introduced by Clarkson 
and Schneider [12]. Here, we focus on statically proving hyperproperties for 
imperative and object-oriented programs; much more work exists for testing 
or monitoring hyperproperties like secure information flow at runtime, or for 
reasoning about hyperproperties in different programming paradigms. 

Relational logics such as Relational Hoare Logic [11], Relational Separation 
Logic [29] and others [1,10] allow reasoning directly about relational properties 
of two different program executions. Unlike our approach, they usually allow 
reasoning about the executions of two different programs; as a result, they do 
not give special support for two executions of the same program calling the same 
procedure with a relational specification. Recently, Banerjee et al. [5] introduced 
biprograms, which allow explicitly expressing alignment between executions and 
using relational specifications to reason about aligned calls; however, this app- 
roach requires that procedures with relational specifications are always called 
by both executions, which is for instance not the case if a call occurs under 
a high guard in secure information flow verification. We handle such cases by 
interpreting relational specifications as trivially true; one can then still resort to 
functional specifications to complete the proof. Their work also does not allow 
mixed specifications, which are easily supported in our product programs. Rela- 
tional program logics are generally difficult to automate. Recent work by Sousa 
and Dillig [27] presents a logic that can be applied automatically by an algorithm 
that implicitly constructs different product programs that align some identical 
statements, but does not fully support relational specifications. Moreover, their 
approach requires dedicated tool support, whereas our modular product pro- 
grams can be verified using off-the-shelf tools. 

The approach of reducing hyperproperties to ordinary trace properties was 
introduced by self-composition [9]. While self-composition is theoretically com- 
plete, it does not allow modular reasoning with relational specifications. The 
resulting problem of having to fully specify program behavior was pointed out 
by Terauchi and Aiken [28]; since then, there have been a number of different 
attempts to solve this problem by allowing (parts of) programs to execute in 
lock-step. Terauchi and Aiken [28] did this for secure information flow by relying 
on information from a type system; other similar approaches exist [23]. 

Product programs [6,7] allow different interleavings of program executions. 
The initial product program approach [6] would in principle allow the use of 
relational specifications for procedure calls, but only under the restriction that 
both program executions always follow the same control flow. The generalized 
approach [7] allows combining different programs and arbitrary numbers of exe- 
cutions. This product construction is non-deterministic and usually interactive. 
In some (but not all) cases, programmers can manually construct product pro- 
grams that avoid duplicated calls and loops and thereby allow using relational 
specifications. However, whether this is possible depends on the used specifica- 
tion, meaning that the product construction and verification are intertwined and 
a new product has to be constructed when specifications change. In contrast, our 
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new product construction is fully deterministic and automatic, allows arbitrary 
control flows while still being able to use relational specifications for all loops 
and calls, and therefore avoids the issue of requiring full functional specifications. 

Considerable work has been invested into proving specific hyperproperties 
like secure information flow. One popular approach is the use of type systems 
26]; while those are modular and offer good performance, they overapproximate 
possible program behaviors and are therefore less precise than approaches using 
logics. In particular, they require labeling any single value as either high or 
low, and do not allow distinctions like the one we made for the example in 
Fig. 1, where only the first bits of a sequence of integers were low. In addition, 
type systems typically struggle to prevent information leaks via side channels 
like termination or program aborts. There have been attempts to create type 
systems that handle some of these limitations (e.g. [15]). 

Static analyses [2,17] enable fully automatic reasoning. They are typically 
not modular and, similarly to type systems, need to abstract semantic informa- 
tion, which can lead to false positives. They strike a trade-off different from our 
solution, which requires specifications, but enables precise, modular reasoning. 

A number of logic-based approaches to proving specific hyperproperties exist. 
As an example, Darvas et al. use dynamic logic for proving non-interference [14]; 
this approach offers some automation, but requires user interaction for most 
realistic programs. Leino et al. [19] verify determinism up to equivalence using 
self-composition, which suffers from the drawbacks explained above. 

Different kinds of declassification have been studied extensively, Sabelfeld 
and Sands [24] provide a good overview. Li and Zdancewic [20] introduce down- 
grading policies that describe which information can be declassified and, similar 
to our approach, can do so for arbitrary expressions. 


9 Conclusion and Future Work 


We have presented modular product programs, a novel form of product programs 
that enable modular reasoning about k-safety hyperproperties using relational 
specifications with off-the-shelf verifiers. We showed that modular products are 
expressive enough to handle advanced aspects of secure information flow verifi- 
cation. They can prove the absence of termination and timing side channels and 
encode declassification. Our implementation shows that our technique works in 
practice on a number of challenging examples from the literature, and exhibits 
good performance even without optimizations. 

For future work, we plan to infer relational properties by using standard 
program analysis techniques on the products. We also plan to generalize our 
technique to prove probabilistic secure information flow for concurrent program 
by combining our encoding with ideas from concurrent separation logic. Finally, 
we plan to optimize our encoding to further improve performance. 
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Abstract. We present a framework for simultaneously verifying the 
functional correctness and the worst-case asymptotic time complexity 
of higher-order imperative programs. We build on top of Separation 
Logic with Time Credits, embedded in an interactive proof assistant. 
We formalize the O notation, which is key to enabling modular specifi- 
cations and proofs. We cover the subtleties of the multivariate case, where 
the complexity of a program fragment depends on multiple parameters. 
We propose a way of integrating complexity bounds into specifications, 
present lemmas and tactics that support a natural reasoning style, and 
illustrate their use with a collection of examples. 


1 Introduction 


A program or program component whose functional correctness has been verified 
might nevertheless still contain complexity bugs: that is, its performance, in some 
scenarios, could be much poorer than expected. 

Indeed, many program verification tools only guarantee partial correctness, 
that is, do not even guarantee termination, so a verified program could run 
forever. Some program verification tools do enforce termination, but usually 
do not allow establishing an explicit complexity bound. Tools for automatic 
complexity inference can produce complexity bounds, but usually have limited 
expressive power. 

In practice, many complexity bugs are revealed by testing. Some have also 
been detected during ordinary program verification, as shown by Filliatre and 
Letouzey [14], who find a violation of the balancing invariant in a widely- 
distributed implementation of binary search trees. Nevertheless, none of these 
techniques can guarantee, with a high degree of assurance, the absence of com- 
plexity bugs in software. 

To illustrate the issue, consider the binary search implementation in Fig. 1. 
Virtually every modern software verification tool allows proving that this OCaml 
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code (or analogous code, expressed in another programming language) satisfies 
the specification of a binary search and terminates on all valid inputs. This code 
might even pass a lightweight testing process, as some search queries will be 
answered very quickly, even if the array is very large. Yet, a more thorough 
testing process would reveal a serious issue: a search for a value that is stored 
in the second half of the range [i, j) takes linear time. It would be embarrassing 
if such faulty code was deployed, as it would aggravate benevolent users and 
possibly allow malicious users to mount denial-of-service attacks. 


(* Requires t to be a sorted array of integers. 
Returns k such that i <= k < j and t.(k) = v 
or -1 if there is no such k. *) 

let rec bsearch t v i j = 

if j <= i then -i else 
let k= i + (j - i) / 2 in 
if v = t.(k) then k 
else if v < t.(k) then bsearch t v ik 
else bsearch t v (i+1) j 


Fig. 1. A flawed binary search. This code is provably correct and terminating, yet 
exhibits linear (instead of logarithmic) time complexity for some input parameters. 


As illustrated above, complexity bugs can affect execution time, but could 
also concern space (including heap space, stack space, and disk space) or other 
resources, such as the network, energy, and so on. In this paper, for simplicity, 
we focus on execution time only. That said, much of our work is independent of 
which resource is considered. We expect that our techniques could be adapted 
to verify asymptotic bounds on the use of other non-renewable resources, such 
as the network. 

We work with a simple model of program execution, where certain opera- 
tions, such as calling a function or entering a loop body, cost one unit of time, 
and every other operation costs nothing. Although this model is very remote 
from physical running time, it is independent of the compiler, operating system, 
and hardware [18,24] and still allows establishing asymptotic time complexity 
bounds, and therefore, detecting complexity bugs—situations where a program 
is asymptotically slower than it should be. 

In prior work [11], the second and third authors present a method for ver- 
ifying that a program satisfies a specification that includes an explicit bound 
on the program’s worst-case, amortized time complexity. They use Separation 
Logic with Time Credits, a simple extension of Separation Logic [23] where the 
assertion $1 represents a permission to perform one step of computation, and is 
consumed when exercised. The assertion $n is a separating conjunction of n such 
time credits. Separation Logic with Time Credits is implemented in the second 
author’s interactive verification framework, CFML [9,10], which is embedded in 
the Coq proof assistant. 
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Using CFML, the second and third authors verify the correctness and time 
complexity of an OCaml implementation of the Union-Find data structure [11]. 
However, their specifications involve concrete cost functions: for instance, the 
precondition of the function find indicates that calling find requires and con- 
sumes $(2a(n) + 4), where n is the current number of elements in the data 
structure, and where a denotes an inverse of Ackermann’s function. We would 
prefer the specification to give the asymptotic complexity bound O(a(n)), which 
means that, for some function f € O(a(n)), calling find requires and consumes 
$f(n). This is the purpose of this paper. 

We argue that the use of asymptotic bounds, such as O(a(n)), is necessary 
for (verified or unverified) complexity analysis to be applicable at scale. At a 
superficial level, it reduces clutter in specifications and proofs: O(mn) is more 
compact and readable than 3mn + 2nlogn+5n+3m-+2. At a deeper level, it is 
crucial for stating modular specifications, which hide the details of a particular 
implementation. Exposing the fact that find costs 2a(n) + 4 is undesirable: if 
a tiny modification of the Union-Find module changes this cost to 2a(n) + 5, 
then all direct and indirect clients of the Union-Find module must be updated, 
which is intolerable. Furthermore, sometimes, the constant factors are unknown 
anyway. Applying the Master Theorem [12] to a recurrence equation only yields 
an order of growth, not a concrete bound. Finally, for most practical purposes, no 
critical information is lost when concrete bounds such as 2a(n) + 4 are replaced 
with asymptotic bounds such as O(a(n)). Indeed, the number of computation 
steps that take place at the source level is related to physical time only up to 
a hardware- and compiler-dependent constant factor. The use of asymptotic 
complexity in the analysis of algorithms, initially advocated by Hopcroft and by 
Tarjan, has been widely successful and is nowadays standard practice. 

One must be aware of several limitations of our approach. First, it is not a 
worst-case execution time (WCET) analysis: it does not yield bounds on actual 
physical execution time. Second, it is not fully automated. We place emphasis 
on expressiveness, as opposed to automation. Our vision is that verifying the 
functional correctness and time complexity of a program, at the same time, 
should not involve much more effort than verifying correctness alone. Third, 
we control only the growth of the cost as the parameters grow large. A loop 
that counts up from 0 to 2°° has complexity O(1), even though it typically 
won’t terminate in a lifetime. Although this is admittedly a potential problem, 
traditional program verification falls prey to analogous pitfalls: for instance, a 
program that attempts to allocate and initialize an array of size (say) 28 can be 
proved correct, even though, on contemporary desktop hardware, it will typically 
fail by lack of memory. We believe that there is value in our approach in spite 
of these limitations. 

Reasoning and working with asymptotic complexity bounds is not as simple 
as one might hope. As demonstrated by several examples in Sect. 2, typical paper 
proofs using the O notation rely on informal reasoning principles which can easily 
be abused to prove a contradiction. Of course, using a proof assistant steers us 
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clear of this danger, but implies that our proofs cannot be quite as simple and 
perhaps cannot have quite the same structure as their paper counterparts. 

A key issue that we run against is the handling of existential quantifiers. 
According to what was said earlier, the specification of a sorting algorithm, say 
mergesort, should be, roughly: “there exists a cost function f € O(An.n log n) 
such that mergesort is content with $f(n), where n is the length of the input 
list.” Therefore, the very first step in a naive proof of mergesort must be to 
exhibit a witness for f, that is, a concrete cost function. An appropriate witness 
might be An.2nlogn, or An.nlogn + 3, who knows? This information is not 
available up front, at the very beginning of the proof; it becomes available only 
during the proof, as we examine the code of mergesort, step by step. It is not 
reasonable to expect the human user to guess such a witness. Instead, it seems 
desirable to delay the production of the witness and to gradually construct a cost 
expression as the proof progresses. In the case of a nonrecursive function, such 
as insertionsort, the cost expression, once fully synthesized, yields the desired 
witness. In the case of a recursive function, such as mergesort, the cost expression 
yields the body of a recurrence equation, whose solution is the desired witness. 

We make the following contributions: 


1. We formalize O as a binary domination relation between functions of type 
A — Z, where the type A is chosen by the user. Functions of several variables 
are covered by instantiating A with a product type. We contend that, in order 
to define what it means for a € A to “grow large”, or “tend towards infinity”, 
the type A must be equipped with a filter [6], that is, a quantifier Ua.P. 
(Eberl [13] does so as well.) We propose a library of lemmas and tactics that 
can prove nonnegativeness, monotonicity, and domination assertions (Sect. 3). 

2. We propose a standard style of writing specifications, in the setting of the 
CFML program verification framework, so that they integrate asymptotic 
time complexity claims (Sect. 4). We define a predicate, specO, which imposes 
this style and incorporates a few important technical decisions, such as the 
fact that every cost function must be nonnegative and nondecreasing. 

3. We propose a methodology, supported by a collection of Coq tactics, to prove 
such specifications (Sect. 5). Our tactics, which heavily rely on Coq metavari- 
ables, help gradually synthesize cost expressions for straight-line code and 
conditionals, and help construct the recurrence equations involved in the anal- 
ysis of recursive functions, while delaying their resolution. 

4. We present several classic examples of complexity analyses (Sect. 6), includ- 
ing: a simple loop in O(n.2”), nested loops in O(n) and O(nm), binary search 
in O(log n), and Union-Find in O(a(n)). 


Our code can be found online in the form of two standalone Coq libraries 
and a self-contained archive [16]. 


2 Challenges in Reasoning with the O Notation 


When informally reasoning about the complexity of a function, or of a code 
block, it is customary to make assertions of the form “this code has asymptotic 
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complexity O(1)”, “that code has asymptotic complexity O(n)”, and so on. Yet, 
these assertions are too informal: they do not have sufficiently precise meaning, 
and can be easily abused to produce flawed paper proofs. 

A striking example appears in Fig. 2, which shows how one might “prove” 
that a recursive function has complexity O(1), whereas its actual cost is O(n). 
The flawed proof exploits the (valid) relation O(1) + O(1) = O(1), which means 
that a sequence of two constant-time code fragments is itself a constant-time 
code fragment. The flaw lies in the fact that the O notation hides an existential 
quantification, which is inadvertently swapped with the universal quantification 
over the parameter n. Indeed, the claim is that “there exists a constant c such 
that, for every n, waste(n) runs in at most c computation steps”. However, 
the proposed proof by induction establishes a much weaker result, to wit: “for 
every n, there exists a constant c such that waste(n) runs in at most c steps”. 
This result is certainly true, yet does not entail the claim. 

An example of a different nature appears in Fig. 3. There, the auxiliary func- 
tion g takes two integer arguments n and m and involves two nested loops, over 
the intervals [1,n] and [1, m]. Its asymptotic complexity is O(n + nm), which, 
under the hypothesis that m is large enough, can be simplified to O(nm). The 
reasoning, thus far, is correct. The flaw lies in our attempt to substitute 0 for m 


Incorrect claim: The OCaml function waste has asymptotic complexity O(1). 


let rec waste n = 
if n > 0 then waste (n-1) 


Flawed proof: 
Let us prove by induction on n that waste(n) costs O(1). 
— Case n < 0: waste(n) terminates immediately. Therefore, its cost is O(1). 


— Case n > 0: A call to waste(n) involves constant-time processing, followed with a 
call to waste(n — 1). By the induction hypothesis, the cost of the recursive call is 
O(1). We conclude that the cost of waste(n) is O(1) + O(1), that is, O(1). 


Fig. 2. A flawed proof that waste(n) costs O(1), when its actual cost is O(n). 


Incorrect claim: The OCaml function f has asymptotic complexity O(1). 


let g (n, m) = 
for i= 1 to n do 
for j = 1 to m do () done 
done 
let f n = g (n, 0) 


Flawed proof: 
— g(n,m) involves nm inner loop iterations, thus costs O(nm). 
— The cost of f(n) is the cost of g(n, 0), plus O(1). As the cost of g(n,m) is O(nm), 
we find, by substituting 0 for m, that the cost of g(n, 0) is O(0). Thus, f(n) is O(1). 


Fig. 3. A flawed proof that f(n) costs O(1), when its actual cost is O(n). 
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Incorrect claim: The OCaml function h has asymptotic complexity O(nm?”). 
1 let h (m, n) = 

2 for i= 0 to m—1 do 

3 let p = (if i = 0 then pow2 n else n*i) in 

4 for j = 1 to p do () done 

5 done 


Flawed proof: 
— The body of the outer loop (lines 3-4) has asymptotic cost O(nz). Indeed, as soon 
as i > 0 holds, the inner loop performs ni constant-time iterations. The case where 
i = 0 does not matter in an asymptotic analysis. 


— The cost of h(m, n) is the sum of the costs of the iterations of the outer loop: 


Do Olmi) = O(n- Ezo i) = Onm’). 


Fig. 4. A flawed proof that h(m, n) costs O(nm?), when its actual cost is O(2"+-nm?). 


in the bound O(nm). Because this bound is valid only for sufficiently large m, it 
does not make sense to substitute a specific value for m. In other words, from the 
fact that “g(n,m) costs O(nm) when n and m are sufficiently large”, one cannot 
deduce anything about the cost of g(n,0). To repair this proof, one must take 
a step back and prove that g(n,m) has asymptotic complexity O(n + nm) for 
sufficiently large n and for every m. This fact can be instantiated with m = 0, 
allowing one to correctly conclude that g(n,0) costs O(n). We come back to this 
example in Sect. 3.3. 

One last example of tempting yet invalid reasoning appears in Fig.4. We 
borrow it from Howell [19]. This flawed proof exploits the dubious idea that “the 
asymptotic cost of a loop is the sum of the asymptotic costs of its iterations”. In 
more precise terms, the proof relies on the following implication, where f(m, n, i) 
represents the true cost of the i-th loop iteration and g(m,n,i) represents an 
asymptotic bound on f(m,n, 7%): 


F(m,n,i) € O(g(m,n,i)) > DMG" F(m,n, i) € O (Ezo slm, ni) 


As pointed out by Howell, this implication is in fact invalid. Here, f(m, n, 0) is 2” 
and f(m,n,i) when i > 0 is ni, while g(m,n,i) is just ni. The left-hand side of 
the above implication holds, but the right-hand side does not, as 2” + ry ni 
is O(2” + nm?), not O(nm?). The Summation lemma presented later on in 
this paper (Lemma 8) rules out the problem by adding the requirement that f 
be a nondecreasing function of the loop index i. We discuss in depth later on 
(Sect. 4.5) why cost functions should and can be monotonic. 

The examples that we have presented show that the informal reasoning style 
of paper proofs, where the O notation is used in a loose manner, is unsound. 
One cannot hope, in a formal setting, to faithfully mimic this reasoning style. In 
this paper, we do assign O specifications to functions, because we believe that 
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this style is elegant, modular and scalable. However, during the analysis of a 
function body, we abandon the O notation. We first synthesize a cost expression 
for the function body, then check that this expression is indeed dominated by 
the asymptotic bound that appears in the specification. 


3 Formalizing the O Notation 


3.1 Domination 


In many textbooks, the fact that f is bounded above by g asymptotically, up to 
constant factor, is written “f = O(g)” or “f € O(g)”. However, the former 
notation is quite inappropriate, as it is clear that “f = O(g)” cannot be literally 
understood as an equality. Indeed, if it truly were an equality, then, by symmetry 
and transitivity, fı = O(g) and f2 = O(g) would imply fi = fo. The latter 
notation makes much better sense: O(g) is then understood as a set of functions. 
This approach has in fact been used in formalizations of the O notation [3]. 
Yet, in this paper, we prefer to think directly in terms of a domination preorder 
between functions. Thus, instead of “f € O(g)”, we write f < g. 

Although the O notation is often defined in the literature only in the special 
case of functions whose domain is N, Z or R, we must define domination in 
the general case of functions whose domain is an arbitrary type A. By later 
instantiating A with a product type, such as Z*, we get a definition of domination 
that covers the multivariate case. Thus, let us fix a type A, and let f and g inhabit 
the function type A — Z.! 

Fixing the type A, it turns out, is not quite enough. In addition, the type A 
must be equipped with a filter [6]. To see why that is the case, let us work 
towards the definition of domination. As is standard, we wish to build a notion 
of “growing large enough” into the definition of domination. That is, instead of 
requiring a relation of the form |f(x)| < c |g(x)| to be “everywhere true”, we 
require it to be “ultimately true”, that is, “true when z is large enough” .? Thus, 
f < g should mean, roughly: 


“up to a constant factor, ultimately, |f| is bounded above by |g|.” 
That is, somewhat more formally: 


“for some c, for every sufficiently large x, |f (x)| < c|g(x)|” 


In mathematical notation, we would like to write: dc. Uz. |f(x)| < clg(z)|. 
For such a formula to make sense, we must define the meaning of the formula 
Ux.P, where x inhabits the type A. This is the reason why the type A must be 


1 At this time, we require the codomain of f and g to be Z. Following Avigad and 
Donnelly [3], we could allow it to be an arbitrary nondegenerate ordered ring. We 
have not yet needed this generalization. 

2 When A is N, provided g(x) is never zero, requiring the inequality to be “everywhere 
true” is in fact the same as requiring it to be “ultimately true”. Outside of this special 
case, however, requiring the inequality to hold everywhere is usually too strong. 
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equipped with a filter U, which intuitively should be thought of as a quantifier, 
whose meaning is “ultimately”. Let us briefly defer the definition of a filter 
(Sect. 3.2) and sum up what has been explained so far: 


Definition 1 (Domination). Let A be a filtered type, that is, a type A equipped 
with a filter U4. 
The relation x4 on A— Z is defined as follows: 


fxag = de. Uar. |f(z)| < elg(z)| 


3.2 Filters 


Whereas Vx.P means that P holds of every x, and Jxz.P means that P holds 
of some x, the formula Ux.P should be taken to mean that P holds of every 
sufficiently large x, that is, P ultimately holds. 

The formula Ux.P is short for U (Az.P). If x ranges over some type A, then 
U must have type P(P(A)), where P(A) is short for A — Prop. To stress this 
better, although Bourbaki [6] states that a filter is “a set of subsets of A”, it is 
crucial to note that P(P(A)) is the type of a quantifier in higher-order logic. 


Definition 2 (Filter). A filter [6] on a type A is an object U of type P(P(A)) 
that enjoys the following four properties, where Ux.P is short for U (Ax.P): 


(1) (Pi => P2) > Ux.P, > Ux.P2, (covariance) 

(2a) Uzx.Pı AUzx.Pz > Uzx.(P, ^A Po) (stability under binary intersection) 
(2b) Uz. True (stability under 0-ary intersection) 
(3) Us.P > 3x.P (nonemptiness) 


Properties (1)-(3) are intended to ensure that the intuitive reading of Ux. P 
as: “for sufficiently large z, P holds” makes sense. Property (1) states that if 
P, implies P) and if P, holds when gx is large enough, then Pz, too, should 
hold when x is large enough. Properties (2a) and (2b), together, state that if 
each of P4, ..., Pk independently holds when x is large enough, then P),..., Pk 
should simultaneously hold when « is large enough. Properties (1) and (2b) 
together imply Yx.P = Uz.P. Property (3) states that if P holds when z is large 
enough, then P should hold of some z. In classical logic, it would be equivalent 
to =(Uz.False). 

In the following, we let the metavariable A stand for a filtered type, that is, a 
pair of a carrier type and a filter on this type. By abuse of notation, we also write 
A for the carrier type. (In Coq, this is permitted by an implicit projection.) We 
write U4 for the filter. 


3.3 Examples of Filters 


When U is a universal filter, Ux.Q(x) is (by definition) equivalent to Vz.Q(2). 
Thus, a predicate Q is “ultimately true” if and only if it is “everywhere true”. 
In other words, the universal quantifier is a filter. 
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Definition 3 (Universal filter). Let T be a nonempty type. Then \Q.Vx.Q(x) 
is a filter on T. 


When U is the order filter associated with the ordering <, the formula 
Uz.Q(x) means that, when x becomes sufficiently large with respect to <, the 
property Q(x) becomes true. 


Definition 4 (Order filter). Let (T, <) be a nonempty ordered type, such that 
every two elements have an upper bound. Then AQ.3xo.Yx > xo. Q(x) is a filter 
on T. 


The order filter associated with the ordered type (Z,<) is the most natural 
filter on the type Z. Equipping the type Z with this filter yields a filtered type, 
which, by abuse of notation, we also write Z. Thus, the formula Uz x.Q(a) means 
that Q(x) becomes true “as x tends towards infinity”. 

By instantiating Definition 1 with the filtered type Z, we recover the classic 
definition of domination between functions of Z to Z: 


frzg <— Je. Jno. Vn > no. |f(n)| < € lg(n)| 


We now turn to the definition of a filter on a product type A, x A2, where A; 
and Ag are filtered types. Such a filter plays a key role in defining domination 
between functions of several variables. The following product filter is the most 
natural construction, although there are others: 


Definition 5 (Product filter). Let A; and Ag be filtered types. Then 


Usa, Tı. Qı 
AQ.AQ1, Qo. TAN Ua, T32- Q2 
A Vz1, £2. Q1 (21) A Q2(£2) > Q(#1, £2) 


is a filter on the product type A, x Ag. 


To understand this definition, it is useful to consider the special case where 
A; and Ag are both Z. Then, for i € {1,2}, the formula Uy, zi. Qi means 
that the predicate Q; contains an infinite interval of the form [a;, o0). Thus, 
the formula V21, £2. Q1(%1) A Qo(x2) = Q(x1, £2) requires the predicate Q to 
contain the infinite rectangle [a1,00) x [az,00). Thus, a predicate Q on Z? is 
“ultimately true” w.r.t. to the product filter if and only if it is “true on some 
infinite rectangle”. In Bourbaki’s terminology [6, Chap. 1, Sect. 6.7], the infinite 
rectangles form a basis of the product filter. 

We view the product filter as the default filter on the product type A; x Ag. 
Whenever we refer to A; x Ag in a setting where a filtered type is expected, the 
product filter is intended. 

We stress that there are several filters on Z, including the universal filter 
and the order filter, and therefore several filters on Z*. Therefore, it does not 
make sense to use the O notation without specifying which filter one consid- 
ers. Consider again the function g(n,m) in Fig. 3 (Sect. 2). One can prove that 
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g(n,m) has complexity O(nm + n) with respect to the standard filter on Z?. 
With respect to this filter, this complexity bound is equivalent to O(mn), as the 
functions A(m,n).mn +n and A(m,n).mn dominate each other. Unfortunately, 
this does not allow deducing anything about the complexity of g(n, 0), since the 
bound O(mn) holds only when n and m grow large. An alternate approach is to 
prove that g(n,m) has complexity O(nm + n) with respect to a stronger filter, 
namely the product of the standard filter on Z and the universal filter on Z. 
With respect to that filter, the functions A(m,n).mn +n and A(m,n).mn are 
not equivalent. This bound does allow instantiating m with 0 and deducing that 
g(n,0) has complexity O(n). 


3.4 Properties of Domination 


Many properties of the domination relation can be established with respect to an 
arbitrary filtered type A. Here are two example lemmas; there are many more. 
As before, f and g range over A — Z. The operators f + g, max(f,g) and f.g 
denote pointwise sum, maximum, and product, respectively. 


Lemma 6 (Sum and Max Are Alike). Assume f and g are ultimately non- 
negative, that is, Uax. f(x) > 0 and Uaz. g(x) > 0 hold. Then, we have 
max(f,g) Xa f +g and f+g Xa max(f,g). 


Lemma 7 (Multiplication). fı <4 gi and fo Xa go imply fi.fe SA 91-92- 


Lemma 7 corresponds to Howell’s Property 5 [19]. Whereas Howell states this 
property on N*, our lemma is polymorphic in the type A. As noted by Howell, 
this lemma is useful when the cost of a loop body is independent of the loop 
index. In the case where the cost of the i-th iteration may depend on the loop 
index i, the following, more complex lemma is typically used instead: 


Lemma 8 (Summation). Let f,g range over A —> Z — Z. Let io E€ Z. 
Assume the following three properties: 


1. Usa. Vi > io. f(a)(i) > 0. 
2. Uaa. Vi > io. gla) (i) > 0. 
3. for every a, the function Ai. f(a)(i) is nondecreasing on the interval fio, 00). 


Then, 
A(a,i).f (a)(i) Saxz Ala, i).g(a)(i) 
implies 
Alan). Dizi F(a) Saxz Alan). Disi g(a) (i). 

Lemma 8 uses the product filter on A x Z in its hypothesis and conclusion. It 
corresponds to Howell’s property 2 [19]. The variable i represents the loop index, 
while the variable a collectively represents all other variables in scope, so the 
type A is usually instantiated with a tuple type (an example appears in Sect. 6). 

An important property is the fact that function composition is compatible, 
in a certain sense, with domination. This allows transforming the parameters 
under which an asymptotic analysis is carried out (examples appear in Sect. 6). 
Due to space limitations, we refer the reader to the Coq library for details [16]. 
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3.5 Tactics 


Our formalization of filters and domination forms a stand-alone Coq library [16]. 
In addition to many lemmas about these notions, the library proposes automated 
tactics that can prove nonnegativeness, monotonicity, and domination goals. 
These tactics currently support functions built out of variables, constants, sums 
and maxima, products, powers, logarithms. Extending their coverage is ongoing 
work. This library is not tied to our application to the complexity analysis of 
programs. It could have other applications in mathematics. 


4 Specifications with Asymptotic Complexity Claims 


In this section, we first present our existing approach to verified time complexity 
analysis. This approach, proposed by the second and third authors [11], does not 
use the O notation: instead, it involves explicit cost functions. We then discuss 
how to extend this approach with support for asymptotic complexity claims. 
We find that, even once domination (Sect. 3) is well-understood, there remain 
nontrivial questions as to the style in which program specifications should be 
written. We propose one style which works well on small examples and which 
we believe should scale well to larger ones. 


4.1 CFML with Time Credits for Cost Analysis 


CFML [9,10] is a system that supports the interactive verification of OCaml 
programs, using higher-order Separation Logic, inside Coq. It is composed of a 
trusted standalone tool and a Coq library. The CFML tool transforms a piece 
of OCaml code into a characteristic formula, a Coq formula that describes the 
semantics of the code. The characteristic formula is then exploited, inside Coq, 
to state that the code satisfies a certain specification (a Separation Logic triple) 
and to interactively prove this statement. The CFML library provides a set of 
Coq tactics that implement the reasoning rules of Separation Logic. 

In prior work [11], the second and third authors extend CFML with time 
credits [2,22] and use it to simultaneously verify the functional correctness and 
the (amortized) time complexity of OCaml code. To illustrate the style in which 
they write specifications, consider a function that computes the length of a list: 


let rec length 1 = 
match 1 with 
| E] -> 0 
l _ :: 1 -> 1 + length 1 


About this function, one can prove the following statement: 


V(A: Type) (l : list A). { $(|J| + 1) } (Length J) {Ay. [y = |Z] ]} 
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This is a Separation Logic triple {H}(t){Q}. The postcondition Ay. [y = |l|] 
asserts that the call length / returns the length of the list 1.2 The precondition 
$(|l| + 1) asserts that this call requires |l| + 1 credits. This triple is proved in a 
variant of Separation Logic where every function call and every loop iteration 
consumes one credit. Thus, the above specification guarantees that the execution 
of length l involves no more than |/| + 1 function calls or loop iterations. Our 
previous paper [11, Definition 2] gives a precise definition of the meaning of 
triples. 

As argued in prior work [11, Sect.2.7], bounding the number of function 
calls and loop iterations is equivalent, up to a constant factor, to bounding the 
number of reduction steps of the program. Assuming that the OCaml compiler 
is complexity-preserving, this is equivalent, up to a constant factor, to bounding 
the number of instructions executed by the compiled code. Finally, assuming 
that the machine executes one instruction in bounded time, this is equivalent, 
up to a constant factor, to bounding the execution time of the compiled code. 
Thus, the above specification guarantees that length runs in linear time. 

Instead of understanding Separation Logic with Time Credits as a variant 
of Separation Logic, one can equivalently view it as standard Separation Logic, 
applied to an instrumented program, where a pay () instruction has been inserted 
at the beginning of every function body and loop body. The proof of the pro- 
gram is carried out under the axiom {$1} (payQ) {A_.T}, which imposes the 
consumption of one time credit at every pay () instruction. This instruction has 
no runtime effect: it is just a way of marking where credits must be consumed. 

For example, the OCaml function length is instrumented as follows: 


let rec length 1 = 
pay (); 
match 1 with [] -> 0 | _ :: 1 -> 1 + length 1 


Executing “length /” involves executing pay() exactly |/| + 1 times. For this 
reason, a valid specification of this instrumented code in ordinary Separation 
Logic must require at least || + 1 credits in its precondition. 


4.2 A Modularity Challenge 


The above specification of length guarantees that length runs in linear time, 
but does not allow predicting how much real time is consumed by a call to 
length. Thus, this specification is already rather abstract. Yet, it is still too 
precise. Indeed, we believe that it would not be wise for a list library to publish 
a specification of length whose precondition requires exactly |l| + 1 credits. 
Indeed, there are implementations of length that do not meet this specification. 
For example, the tail-recursive implementation found in the OCaml standard 
library, which in practice is more efficient than the naive implementation shown 


3 The square brackets denote a pure Separation Logic assertion. |l] denotes the length 
of the Coq list l. CFML transparently reflects OCaml integers as Coq relative integers 
and OCaml lists as Coq lists. 
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above, involves exactly |/|-+ 2 function calls, therefore requires || + 2 credits. By 
advertising a specification where || + 1 credits suffice, one makes too strong a 
guarantee, and rules out the more efficient implementation. 

After initially publishing a specification that requires $(|l| + 1), one could of 
course still switch to the more efficient implementation and update the published 
specification so as to require $(|l| + 2) instead of $(|I| + 1). However, that would 
in turn require updating the specification and proof of every (direct and indirect) 
client of the list library, which is intolerable. 

To leave some slack, one should publish a more abstract specification. For 
example, one could advertise that the cost of length / is an affine function of 
the length of the list J, that is, the cost is a- |I| +b, for some constants a and b: 


d(a,b: Z). V(A : Type) (l : list A). {$(a- |I| + b)} (Length 1) {Ay. [y = || ]} 


This is a better specification, in the sense that it is more modular. The naïve 
implementation of length shown earlier and the efficient implementation in 
OCaml’s standard library both satisfy this specification, so one is free to choose 
one or the other, without any impact on the clients of the list library. In fact, 
any reasonable implementation of length should have linear time complexity 
and therefore should satisfy this specification. 

That said, the style in which the above specification is written is arguably 
slightly too low-level. Instead of directly expressing the idea that the cost of 
length l is O(|l|), we have written this cost under the form a- || + b. It is 
preferable to state at a more abstract level that cost is dominated by An.n: such 
a style is more readable and scales to situations where multiple parameters and 
nonstandard filters are involved. Thus, we propose the following statement: 


cost <z An.n 
V(A: Type)(I: list A). {$cost(|l|)} (Length l) {Ay. [y = || ]} 


| 


deost: Z > Z. { 


Thereafter, we refer to the function cost as the concrete cost of length, as 
opposed to the asymptotic bound, represented here by the function An.n. This 
specification asserts that there exists a concrete cost function cost, which is 
dominated by An.n, such that cost(|l|) credits suffice to justify the execution 
of length J. Thus, cost(|I|) is an upper bound on the actual number of pay O 
instructions that are executed at runtime. 

The above specification informally means that length l has time complexity 
O(n) where the parameter n represents |/|, that is, the length of the list l. The 
fact that n represents |/| is expressed by applying cost to |I| in the precondition. 
The fact that this analysis is valid when n grows large enough is expressed by 
using the standard filter on Z in the assertion cost Xz An.n. 

In general, it is up to the user to choose what the parameters of the cost 
analysis should be, what these parameters represent, and which filter on these 
parameters should be used. The example of the Bellman-Ford algorithm (Sect. 6) 
illustrates this. 
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Record specO (A : filterType) (le : A — A — Prop) 
(bound : A — Z) (P : (A — Z) — Prop) 
:= { cost : A >Z; 
cost_spec : P cost; 
cost_dominated : dominated A cost bound; 
cost_nonneg : Vx, 0 < cost x; 
cost_monotonic : monotonic le Z.le cost; }. 


Fig. 5. Definition of specO. 


4.3 A Record for Specifications 


The specifications presented in the previous section share a common structure. 
We define a record type that captures this common structure, so as to make 
specifications more concise and more recognizable, and so as to help users adhere 
to this specification pattern. 

This type, spec0D, is defined in Fig. 5. The first three fields in this record type 
correspond to what has been explained so far. The first field asserts the existence 
of a function cost of A to Z, where A is a user-specified filtered type. The second 
field asserts that a certain property P cost is satisfied; it is typically a Separation 
Logic triple whose precondition refers to cost. The third field asserts that cost 
is dominated by the user-specified function bound. The need for the last two 
fields is explained further on (Sects. 4.4 and 4.5). 

Using this definition, our proposed specification of length (Sect. 4.2) is stated 
in concrete Coq syntax as follows: 


Theorem length_spec: 
specO Z_filterType Z.le (fun n =n) (fun cost > 
VA (1:list A), triple (length 1) 
PRE ($ (cost |1])) 
POST (fun y > [y = |1| ])) 


The key elements of this specification are Z_filterType, which is Z, equipped 
with its standard filter; the asymptotic bound fun n =n, which means that 
the time complexity of length is O(n); and the Separation Logic triple, which 
describes the behavior of length, and refers to the concrete cost function cost. 

One key technical point is that specO is a strong existential, whose witness 
can be referred to via to the first projection, cost. For instance, the concrete 
cost function associated with length can be referred to as cost length_spec. 
Thus, at a call site of the form length xs, the number of required credits is 
cost length_spec |xs|. 

In the next subsections, we explain why, in the definition of specO, we require 
the concrete cost function to be nonnegative and monotonic. These are design 
decisions; although these properties may not be strictly necessary, we find that 
enforcing them greatly simplifies things in practice. 
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4.4 Why Cost Functions Must Be Nonnegative 


There are several common occasions where one is faced with the obligation of 
proving that a cost expression is nonnegative. These proof obligations arise from 
several sources. 

One source is the Separation Logic axiom for splitting credits, whose state- 
ment is $(m + n) = $m x $n, subject to the side conditions m > 0 and n > 0. 
Without these side conditions, out of $0, one would be able to create $1 x $(—1). 
Because our logic is affine, one could then discard $(—1), keeping just $1. In 
short, an unrestricted splitting axiom would allow creating credits out of thin 
air. Another source of proof obligations is the Summation lemma (Lemmas), 
which requires the functions at hand to be (ultimately) nonnegative. 

Now, suppose one is faced with the obligation of proving that the expression 
cost length_spec |xs| is nonnegative. Because length_spec is an existential 
package (a specO record), this is impossible, unless this information has been 
recorded up front within the record. This is the reason why the field cost_nonneg 
in Fig. 5 is needed. 

For simplicity, we require cost functions to be nonnegative everywhere, as 
opposed to within a certain domain. This requirement is stronger than neces- 
sary, but simplifies things, and can easily be met in practice by wrapping cost 
functions within “max(0,—)”. Our Coq tactics automatically insert “max(0, —)” 
wrappers where necessary, making this issue mostly transparent to the user. In 
the following, for brevity, we write ct for max(0,c), where c € Z. 


4.5 Why Cost Functions Must Be Monotonic 


One key reason why cost functions should be monotonic has to do with the 
“avoidance problem”. When the cost of a code fragment depends on a local 
variable x, can this cost be reformulated (and possibly approximated) in such 
a way that the dependency is removed? Indeed, a cost expression that makes 
sense outside the scope of x is ultimately required. 

The problematic cost expression is typically of the form E||zx|], where || 
represents some notion of the “size” of the data structure denoted by x, and E is 
an arithmetic context, that is, an arithmetic expression with a hole. Furthermore, 
an upper bound on |2| is typically available. This upper bound can be exploited 
if the context E is monotonic, i.e., if x < y implies Efx] < Ely]. Because the 
hole in E can appear as an actual argument to an abstract cost function, we 
must record the fact that this cost function is monotonic. 

To illustrate the problem, consider the following OCaml function, which 
counts the positive elements in a list of integers. It does so, in linear time, 
by first building a sublist of the positive elements, then computing the length of 
this sublist. 


t Another approach would be to define $n only for n € N, in which case an unrestricted 
axiom would be sound. However, as we use Z everywhere, that would be inconvenient. 
A more promising idea is to view $n as linear (as opposed to affine) when n is 
negative. Then, $(—1) cannot be discarded, so unrestricted splitting is sound. 
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let count_pos 1 = 
let 1’ = List.filter (fun x -> x > 0) 1 in 
List.length 1’ 


How would one go about proving that this code actually has linear time complex- 
ity? On paper, one would informally argue that the cost of the sequence pay (); 
filter; length is O(1) + O((|J|) + O((l’|), then exploit the inequality |l’| < |l], 
which follows from the semantics of filter, and deduce that the cost is O(|I|). 

In a formal setting, though, the problem is not so simple. Assume that we 
have two specification lemmas length_spec and filter_spec for List.length 
and List.filter, which describe the behavior of these OCaml functions and 
guarantee that they have linear-time complexity. For brevity, let us write just 
g and f for the functions cost length_spec and cost filter_spec. Also, at 
the mathematical level, let us write Į} for the sublist of the positive elements 
of the list l. It is easy enough to check that the cost of the expression “pay (); 
let 1? = ... in List.length 1’” is 1+ f({l|)+(|l’|). The problem, now, is 
to find an upper bound for this cost that does not depend on I’, a local variable, 
and to verify that this upper bound, expressed as a function of |l|, is dominated 
by An.n. Indeed, this is required in order to establish a specO statement about 
count_pos. 

What might this upper bound be? That is, which functions cost of Z to Z 
are such that (A) 1+ f({l]) + 9(l’|) < cost({l|) can be proved (in the scope of the 
local variable l’) and (B) cost <z An.n holds? Three potential answers come to 
mind: 


1. Within the scope of l’, the equality 1’ = 1| is available, as it follows from 
the postcondition of filter. Thus, within this scope, 1+ f(|) + g(|l’|) is 
provably equal to let l =l} in 1+ F(U) + g((l’)), that is, 1+ F(U) + g (ILL). 
This remark may seem promising, as this cost expression does not depend 
on l’. Unfortunately, this approach falls short, because this cost expression 
cannot be expressed as the application of a closed function cost to |l|. Indeed, 
the length of the filtered list, |/||, is not a function of the length of l. In short, 
substituting local variables away in a cost expression does not always lead to 
a usable cost function. 

2. Within the scope of l’, the inequality |I'| < |l] is available, as it follows from 
i’ = l}. Thus, inequality (A) can be proved, provided we take: 


— 1 
cost = An. az, 1+ f(n) + g(n’) 


Furthermore, for this definition of cost, the domination assertion (B) holds 
as well. The proof relies on the fact the functions g and g, where g is 
An. Maxo<n/<n g(n’) [19], dominate each other. Although this approach 
seems viable, and does not require the function g to be monotonic, it is a 
bit more complicated than we would like. 


A Fistful of Dollars 549 


3. Let us now assume that the function g is monotonic, that is, nondecreasing. 
As before, within the scope of l’, the inequality |I'| < |/| is available. Thus, the 
cost expression 1 + f (|l) + g({l’|) is bounded by 1+ f(|l|) + g(|l|). Therefore, 
inequalities (A) and (B) are satisfied, provided we take: 


cost = \n.1+4+ f(n) + g(n) 


We believe that approach 3 is the simplest and most intuitive, because it 
allows us to easily eliminate l’, without giving rise to a complicated cost function, 
and without the need for a running maximum. 

However, this approach requires that the cost function g, which is short for 
cost length_spec, be monotonic. This explains why we build a monotonicity 
condition in the definition of specO (Fig.5, last line). Another motivation for 
doing so is the fact that some lemmas (such as Lemma 8, which allows reasoning 
about the asymptotic cost of an inner loop) also have monotonicity hypotheses. 

The reader may be worried that, in practice, there might exist concrete cost 
functions that are not monotonic. This may be the case, in particular, of a cost 
function f that is obtained as the solution of a recurrence equation. Fortunately, 
in the common case of functions of Z to Z, the “running maximum” function f 
can always be used in place of f: indeed, it is monotonic and has the same 
asymptotic behavior as f. Thus, we see that both approaches 2 and 3 above 
involve running maxima in some places, but their use seems less frequent with 
approach 3. 


5 Interactive Proofs of Asymptotic Complexity Claims 


To prove a specification lemma, such as length_spec (Sect.4.3) or loop_spec 
(Sect. 4.4), one must construct a specO record. By definition of specO (Fig. 5), 
this means that one must exhibit a concrete cost function cost and prove a 
number of properties of this function, including the fact that, when supplied 
with $(cost ...), the code runs correctly (cost_spec) and the fact that cost is 
dominated by the desired asymptotic bound (cost_dominated). 

Thus, the very first step in a naïve proof attempt would be to guess an 
appropriate cost function for the code at hand. However, such an approach would 
be painful, error-prone, and brittle. It seems much preferable, if possible, to enlist 
the machine’s help in synthesizing a cost function at the same time as we step 
through the code—which we have to do anyway, as we must build a Separation 
Logic proof of the correctness of this code. 

To illustrate the problem, consider the recursive function p, whose integer 
argument n is expected to satisfy n > 0. For the sake of this example, p calls an 
auxiliary function g, which we assume runs in constant time. 


let rec pn = 
if n <= 1 then () else begin g(); p(n-1) end 


Suppose we wish to establish that p runs in linear time. As argued at the 
beginning of the paper (Sect. 2, Fig. 2), it does not make sense to attempt a proof 
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by induction on n that “p nruns in time O(n)”. Instead, in a formal framework, 
we must exhibit a concrete cost function cost such that cost(n) credits justify 
the call p n and cost grows linearly, that is, cost Xz An.n. 

Let us assume that a specification lemma g_spec for the function g has 
been established already, so the number of credits required by a call to g is 
cost g_spec (). In the following, we write G as a shorthand for this constant. 

Because this example is very simple, it is reasonably easy to manually come 
up with an appropriate cost function for p. One valid guess is An. 1+ 7%_,(1+G). 
Another valid guess, obtained via a simplification step, is An. 1+(1+G)(n—1)*. 
Another witness, obtained via an approximation step, is An. 1 + (1 + G)nt. 
As the reader can see, there is in fact a spectrum of valid witnesses, ranging 
from verbose, low-level to compact, high-level mathematical expressions. Also, 
it should be evident that, as the code grows larger, it can become very difficult 
to guess a valid concrete cost function. 

This gives rise to two questions. Among the valid cost functions, which one 
is preferable? Which ones can be systematically constructed, without guessing? 

Among the valid cost functions, there is a tradeoff. At one extreme, a low-level 
cost function has exactly the same syntactic structure as the code, so it is easy to 
prove that it is an upper bound for the actual cost of the code, but a lot of work 
may be involved in proving that it is dominated by the desired asymptotic bound. 
At the other extreme, a high-level cost function can be essentially identical to the 
desired asymptotic bound, up to explicit multiplicative and additive constants, 
so the desired domination assertion is trivial, but a lot of accounting work may 
be involved in proving that this function represents enough credits to execute 
the code. Thus, by choosing a cost function, we shift some of the burden of the 
proof from one subgoal to another. From this point of view, no cost function 
seems inherently preferable to another. 

From the point of view of systematic construction, however, the answer is 
more clear-cut. It seems fairly clear that it is possible to systematically build a 
cost function whose syntactic structure is the same as the syntactic structure of 
the code. This idea goes at least as far back as Wegbreit’s work [26]. Coming up 
with a compact, high-level expression of the cost, on the other hand, seems to 
require human insight. 

To provide as much machine assistance as possible, our system mechanically 
synthesizes a low-level cost expression for a piece of OCaml code. This is done 
transparently, at the same time as the user constructs a proof of the code in 
Separation Logic. Furthermore, we take advantage of the fact that we are using 
an interactive proof assistant: we allow the user to guide the synthesis process. 
For instance, the user controls how a local variable should be eliminated, how the 
cost of a conditional construct should be approximated (i.e., by a conditional or 
by a maximum), and how recurrence equations should be solved. In the following, 
we present this semi-interactive synthesis process. We first consider straight-line 
(nonrecursive) code (Sect. 5.1), then recursive functions (Sect. 5.2). 
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5.1 Synthesizing Cost Expressions for Straight-Line Code 


The CFML library provides the user with interactive tactics that implement the 
reasoning rules of Separation Logic. We set things up in such a way that, as 
these rules are applied, a cost expression is automatically synthesized. 


WEAKENCOST SEQ 
{Sc *H}(e){Q} a <a {Sct *H} (ex) {Q} {Sco * Q'} (ez) {Q} 
{$c1 x H} (e) {Q} {$ (cf + c3)" x H} (e1;e2) {Q} 
LET VAL 
{$c} x H} (e1) {Q} vz. {$c} x Q'(w)} (e2) {Q} H l- Q(v) 
18 (cf + cf)” * H} (let z = e1 in e2) {Q} {80* « H} (v) {Q} 
IF 
b = true > {$c] x H} (e1) {Q} Pay 
b = false > {$c} x H} (e2) {Q} HHQ) 
{$ (if b then cı else c2)" x H} (if b then e; else e2) {Q} {$17 x H} (pay()) {Q} 
FOR 


Vi.a<i<b > {$c(i)" xI(i)} (e) {I(i+1)}  HIHI(a)xQ 
{$ (Xa<i<o c(i)*)* x H} (for i = a to b — 1 do e done) {I (b) x Q} 


Fig. 6. The reasoning rules of Separation Logic, specialized for cost synthesis. 


To this end, we use specialized variants of the reasoning rules, whose premises 
and conclusions take the form {$n x H} (e) {Q}. Furthermore, to simplify the 
nonnegativeness side conditions that must be proved while reasoning, we make all 
cost expressions obviously nonnegative by wrapping them in max(0,—). Recall 
that c* stands for max(0,c), where c € Z. Our reasoning rules work with triples 
of the form {$ ct x H} (e) {Q}. They are shown in Fig. 6. 

Because we wish to synthesize a cost expression, our Coq tactics maintain 
the following invariant: whenever the goal is {$c* x H}(e) {Q}, the cost c is 
uninstantiated, that is, it is represented in Coq by a metavariable, a placeholder. 
This metavariable is instantiated when the goal is proved by applying one of 
the reasoning rules. Such an application produces new subgoals, whose precon- 
ditions contain new metavariables. As this process is repeated, a cost expression 
is incrementally constructed. 

The rule WEAKENCOST is a special case of the consequence rule of Separation 
Logic. It is typically used once at the root of the proof: even though the initial 
goal {$c, x H}(e){Q} may not satisfy our invariant, because it lacks a —* 
wrapper and because cı is not necessarily a metavariable, WEAKENCOST gives 
rise to a subgoal {$c} x H} (e) {Q} that satisfies it. Indeed, when this rule is 
applied, a fresh metavariable cz is generated. WEAKENCosT can also be explicitly 
applied by the user when desired. It is typically used just before leaving the scope 
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of a local variable x to approximate a cost expression cp that depends on x with 
an expression cı that does not refer to x. 

The SEQ rule is a special case of the LeT rule. It states that the cost of a 
sequence is the sum of the costs of its subexpressions. When this rule is applied 
to a goal of the form {$ct x H}(e){Q}, where c is a metavariable, two new 
metavariables cı and co are introduced, and c is instantiated with ct + cp. 

The LET rule is similar to SEQ, but involves an additional subtlety: the cost c2 
must not refer to the local variable x. Naturally, Coq enforces this condition: any 
attempt to instantiate the metavariable cz with an expression where x occurs 
fails. In such a situation, it is up to the user to use WEAKENCOST so as to avoid 
this dependency. The example of count_pos (Sect. 4.5) illustrates this issue. 

The VAL rule handles values, which in our model have zero cost. The symbol 
I- denotes entailment between Separation Logic assertions. 

The IF rule states that the cost of an OCaml conditional expression is a 
mathematical conditional expression. Although this may seem obvious, one sub- 
tlety lurks here. Using WEAKENCosT, the cost expression if b then cı else c2 can 
be approximated by max(cı, c2). Such an approximation can be beneficial, as 
it leads to a simpler cost expression, or harmful, as it causes a loss of informa- 
tion. In particular, when carried out in the body of a recursive function, it can 
lead to an unsatisfiable recurrence equation. We let the user decide whether this 
approximation should be performed. 

The Pay rule handles the pay () instruction, which is inserted by the CFML 
tool at the beginning of every function and loop body (Sect. 4.1). This instruction 
costs one credit. 

The For rule states that the cost of a for loop is the sum, over all values of 
the index 2, of the cost of the i-th iteration of the body. In practice, it is typically 
used in conjunction with WEAKENCosT, which allows the user to simplify and 
approximate the iterated sum Xa<i<b c(i)". In particular, if the synthesized 
cost c(i) happens to not depend on i, or can be approximated so as to not 
depend on i, then this iterated sum can be expressed under the form c(b—a)*. 
A variant of the For rule, not shown, covers this common case. There is in 
principle no need for a primitive treatment of loops, as loops can be encoded in 
terms of higher-order recursive functions, and our program logic can express the 
specifications of these combinators. Nevertheless, in practice, primitive support 
for loops is convenient. 

This concludes our exposition of the reasoning rules of Fig. 6. Coming back 
to the example of the OCaml function p (Sect. 5), under the assumption that the 
cost of the recursive call p(n-1) is f(n—1), we are able, by repeated application of 
the reasoning rules, to automatically find that the cost of the OCaml expression: 


if n <= 1 then () else begin g(); p(n-1) end 


is: 1+ if n <1 then 0 else (G+ f(n—1)). The initial 1 accounts for the implicit 
pay(). This may seem obvious, and it is. The point is that this cost expression 
is automatically constructed: its synthesis adds no overhead to an interactive 
proof of functional correctness of the function p. 
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5.2 Synthesizing and Solving Recurrence Equations 


There now remains to explain how to deal with recursive functions. Suppose 
S(f) is the Separation Logic triple that we wish to establish, where f stands for 
an as-yet-unknown cost function. Following common informal practice, we would 
like to do this in two steps. First, from the code, derive a “recurrence equation” 
E(f), which in fact is usually not an equation, but a constraint (or a conjunction 
of constraints) bearing on f. Second, prove that this recurrence equation admits 
a solution that is dominated by the desired asymptotic cost function g. This 
approach can be formally viewed as an application of the following tautology: 


VE. (Vf-E(f) > S(f)) > GFE) AF < 9) > BFS AF 9) 


The conclusion S(f) A f < g states that the code is correct and has asymptotic 
cost g. In Coq, applying this tautology gives rise to a new metavariable Æ, as 
the recurrence equation is initially unknown, and two subgoals. 

During the proof of the first subgoal, Vf.E(f) — S(f), the cost function f 
is abstract (universally quantified), but we are allowed to assume E(f), where 
E is initially a metavariable. So, should the need arise to prove that f satisfies 
a certain property, this can be done just by instantiating FE. In the example of 
the OCaml function p (Sect.5), we prove S(f) by induction over n, under the 
hypothesis n > 0. Thus, we assume that the cost of the recursive call p(n-1) 
is f(n — 1), and must prove that the cost of p n is f(n). We synthesize the 
cost of p n as explained earlier (Sect.5.1) and find that this cost is 1+ if n < 
1 then 0 else (G + f(n — 1)). We apply WeakenCost and find that our proof is 
complete, provided we are able to prove the following inequation: 


1+ifn<1 then 0 else (G+ f(n—1)) < f(n) 
We achieve that simply by instantiating E as follows: 
E:=f.Vn.n>0 > 1+if n<1 then 0 else (G+ f(n—1)) < f(n) 


This is our “recurrence equation” —in fact, a universally quantified, conditional 
inequation. We are done with the first subgoal. 

We then turn to the second subgoal, 4f.E(f)Af < g. The metavariable E is 
now instantiated. The goal is to solve the recurrence and analyze the asymptotic 
growth of the chosen solution. There are at least three approaches to solving 
such a recurrence. 

First, one can guess a closed form that satisfies the recurrence. For example, 
the function f := An. 1+ (1+G)n* satisfies E(f) above. But, as argued earlier, 
guessing is in general difficult and tedious. 

Second, one can invoke Cormen et al.’s Master Theorem [12] or the more 
general Akra-Bazzi theorem [1,21]. Unfortunately, at present, these theorems 
are not available in Coq, although an Isabelle/HOL formalization exists [13]. 

The last approach is Cormen et al.’s substitution method [12, Sect. 4]. The 
idea is to guess a parameterized shape for the solution; substitute this shape into 
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the goal; gather a set of constraints that the parameters must satisfy for the goal 
to hold; finally, show that these constraints are indeed satisfiable. In the above 
example, as we expect the code to have linear time complexity, we propose that 
the solution f should have the shape An.(an* +b), where a and b are parameters, 
about which we wish to gradually accumulate a set of constraints. From a formal 
point of view, this amounts to applying the following tautology: 


VP.VC. (Vab. C(a,b) > P(An.(ant +b))) > (Sab. C(a,b)) > 3f-P(f) 


This application again yields two subgoals. During the proof of the first subgoal, 
C is a metavariable and can be instantiated as desired (possibly in several steps), 
allowing us to gather a conjunction of constraints bearing on a and b. During the 
proof of the second subgoal, C is fixed and we must check that it is satisfiable. 
In our example, the first subgoal is: 


E(\n.(ant +6)) A Anant +b) Xz Ann 
The second conjunct is trivial. The first conjunct simplifies to: 
Yn. n>0 > 1+ifn<1 then0 else (G+a(n—1)*++b) < ant +b 


By distinguishing the cases n = 0, n = 1, and n > 1, we find that this property 
holds provided we have 1 < b and 1 < a+ b and 1 + G < a. Thus, we prove this 
subgoal by instantiating C with X(a,b).(1 <bA1<a+b^A1+6G<a). 

There remains to check the second subgoal, that is, Jab.C(a, b). This is easy; 
we pick, for instance, a := 1 + G and b:= 1. This concludes our use of Cormen 
et al.’s substitution method. 

In summary, by exploiting Coq’s metavariables, we are able to set up our 
proofs in a style that closely follows the traditional paper style. During a first 
phase, as we analyze the code, we synthesize a cost function and (if the code 
is recursive) a recurrence equation. During a second phase, we guess the shape 
of a solution, and, as we analyze the recurrence equation, we synthesize a con- 
straint on the parameters of the shape. During a last phase, we check that this 
constraint is satisfiable. In practice, instead of explicitly building and applying 
tautologies as above, we use the first author’s procrastination library [16], 
which provides facilities for introducing new parameters, gradually gathering 
constraints on these parameters, and eventually checking that these constraints 
are satisfiable. 


6 Examples 


Binary Search. We prove that binary search has time complexity O(log n), 
where n = j — i denotes the width of the search interval [i, j). The code is as in 
Fig. 1, except that the flaw is fixed by replacing i+1 with k+1 on the last line. 
As outlined earlier (Sect.5), we synthesize the following recurrence equation on 
the cost function f: 


fO)4+3< fA) A VWn>01<f(n) A Vn>2. f(n/2) +3 < f(n) 
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We apply the substitution method and search for a solution of the form 
An. if n < 0 then 1 else alog n+b, which is dominated by An. log n. Substituting 
this shape into the above constraints, we find that they boil down to (4 < b)A(0 < 
a\1l<b)A(3 <a). Finally, we guess a solution, namely a := 3 and b := 4. 


Dependent Nested Loops. Many algorithms involve dependent nested for 
loops, that is, nested loops, where the bounds of the inner loop depend on the 
outer loop index, as in the following simplified example: 


for i= 1 to n do 
for j = 1 to i do () done 
done 


For this code, the cost function An. X; (1 + De 1) is synthesized. There 
remains to prove that it is dominated by An.n?. We could recognize and prove 
that this function is equal to An, Bats) which clearly is dominated by An.n?. 
This works because this example is trivial, but, in general, computing explicit 
closed forms for summations is challenging, if at all feasible. 

A higher-level approach is to exploit the fact that, if f is monotonic, then 
X; 1 f(a) is less than n.f(n). Applying this lemma twice, we find that the above 
cost function is less than An. )7i"_, (1 +i) which is less than An.n(1 +n) which is 
dominated by An.n?. This simple-minded approach, which does not require the 
Summation lemma (Lemma 8), is often applicable. The next example illustrates 
a situation where the Summation lemma is required. 


A Loop Whose Body Has Exponential Cost. In the following simple exam- 
ple, the loop body is just a function call: 


for i = 0 to n-1 do b(i) done 


Thus, the cost of the loop body is not known exactly. Instead, let us assume 
that a specification for the auxiliary function b has been proved and that its cost 
is O(2"), that is, cost b Xz Xi. 2f holds. We then wish to prove that the cost 
of the whole loop is also O(2"). 

For this loop, the cost function An. )>;_9(1 + cost b (i)) is automatically 
synthesized. We have an asymptotic bound for the cost of the loop body, namely: 
Ni. 1 + cost b (i) Xz Ai. 2'. The side conditions of the Summation lemma 
(Lemma 8) are met: in particular, the function Ai. 1 + cost b (i) is monotonic. 
The lemma yields An. S7/_9(1 + cost b (i)) Xz An. SO", 2. Finally, we have 
An. Sear 2' = dn. 2?t+1 — 1 Xz An. 2”. 


The Bellman-Ford Algorithm. We verify the asymptotic complexity of an 
implementation of Bellman-Ford algorithm, which computes shortest paths in a 
weighted graph with n vertices and m edges. The algorithm involves an outer 
loop that is repeated n— 1 times and an inner loop that iterates over all m edges. 
The specification asserts that the asymptotic complexity is O(nm): 


cost Xz2 A(m,n).nm 
{$cost(#edges(g), #vertices(g))} (bellmanford g) {...} 


cost : Z? = Z. { 
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By exploiting the fact that a graph without duplicate edges must satisfy 
m < n?, we prove that the complexity of the algorithm, viewed as a function of 
n, is O(n?). 


cost <z An.n? 
{$cost(#vertices(g))} (bellmanford g) {...} 


deost: Z— Z. { 


To prove that the former specification implies the latter, one instantiates m 
with n?, that is, one exploits a composition lemma (Sect. 3.4). In practice, we 
publish both specifications and let clients use whichever one is more convenient. 


Union-Find. Charguéraud and Pottier [11] use Separation Logic with Time 
Credits to verify the correctness and time complexity of a Union-Find imple- 
mentation. For instance, they prove that the (amortized) concrete cost of find 
is 2a(n) + 4, where n is the number of elements. With a few lines of proof, 
we derive a specification where the cost of find is expressed under the form 
O(a(n)): 
specO Z_filterType Z.le (fun n > alpha n) (fun cost > 
VD RV x, x \in D triple (UnionFind_ml.find x) 
PRE (UF D R V x $(cost (card D))) 
POST (fun y S>UFDRVx[Rx=y])). 


Union-Find is a mutable data structure, whose state is described by the abstract 
predicate UF D R V. In particular, the parameter D represents the domain of the 
data structure, that is, the set of all elements created so far. Thus, its cardinal, 
card D, corresponds to n. This case study illustrates a situation where the cost 
of an operation depends on the current state of a mutable data structure. 


7 Related Work 


Our work builds on top of Separation Logic [23] with Time Credits [2], which 
has been first implemented in a verification tool and exploited by the second 
and third authors [11]. We refer the reader to their paper for a survey of the 
related work in the general area of formal reasoning about program complexity, 
including approaches based on deductive program verification and approaches 
based on automatic complexity analysis. In this section, we restrict our attention 
to informal and formal treatments of the O notation. 

The O notation and its siblings are documented in several textbooks [7, 15, 20]. 
Out of these, only Howell [19,20] draws attention to the subtleties of the multi- 
variate case. He shows that one cannot take for granted that the properties of the 
O notation, which in the univariate case are well-known, remain valid in the mul- 
tivariate case. He states several properties which, at first sight, seem natural and 
desirable, then proceeds to show that they are inconsistent, so no definition of the 
O notation can satisfy them all. He then proposes a candidate notion of domina- 
tion between functions whose domain is N*. His notation, f € O(g), is defined 


as the conjunction of f € O(g) and f € O(ĝ), where the function f is a “running 
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maximum” of the function f, and is by construction monotonic. He shows that this 
notion satisfies all the desired properties, provided some of them are restricted by 
additional side conditions, such as monotonicity requirements. 

In this work, we go slightly further than Howell, in that we consider functions 
whose domain is an arbitrary filtered type A, rather than necessarily N*. We give 
a standard definition of O and verify all of Howell’s properties, again restricted 
with certain side conditions. We find that we do not need Ô, which is fortunate, as 
it seems difficult to define f in the general case where f is a function of domain A. 
The monotonicity requirements that we impose are not exactly the same as 
Howell’s, but we believe that the details of these administrative conditions do 
not matter much, as all of the functions that we manipulate in practice are 
everywhere nonnegative and monotonic. 

Avigad and Donnelly [3] formalize the O notation in Isabelle/HOL. They 
consider functions of type A — B, where A is arbitrary and B is an ordered 
ring. Their definition of “f = O(g)” requires |f(x)| < c|g(x)| for every z, as 
opposed to “when x is large enough”. Thus, they get away without equipping 
the type A with a filter. The price to pay is an overly restrictive notion of 
domination, except in the case where A is N, where both Vz and Ux yield the 
same notion of domination—this is Brassard and Bratley’s “threshold rule” [7]. 
Avigad and Donnelly suggest defining “f = O(g) eventually” as an abbreviation 
for 3f’, (f! = O(g) A Ux.f(x) = f'(x)). In our eyes, this is less elegant than 
parameterizing O with a filter in the first place. 

Eberl [13] formalizes the Akra-Bazzi method [1,21], a generalization of the 
well-known Master Theorem [12], in Isabelle/HOL. He creates a library of Lan- 
dau symbols specifically for this purpose. Although his paper does not mention 
filters, his library in fact relies on filters, whose definition appears in Isabelle’s 
Complex library. Eberl’s definition of the O symbol is identical to ours. That 
said, because he is concerned with functions of type N —> R or R —> R, he does 
not define product filters, and does not prove any lemmas about domination in 
the multivariate case. Eberl sets up a decision procedure for domination goals, 
like x € O(x?), as well as a procedure that can simplify, say, O(a? +x?) to O(x). 

TiML [25] is a functional programming language where types carry time 
complexity annotations. Its type-checker generates proof obligations that are 
discharged by an SMT solver. The core type system, whose metatheory is formal- 
ized in Coq, employs concrete cost functions. The TiML implementation allows 
associating a O specification with each toplevel function. An unverified compo- 
nent recognizes certain classes of recurrence equations and automatically applies 
the Master Theorem. For instance, mergesort is recognized to be O(mn log n), 
where n is the input size and m is the cost of a comparison. The meaning of the 
O notation in the multivariate case is not spelled out; in particular, which filter 
is meant is not specified. 

Boldo et al. [4] use Coq to verify the correctness of a C program which 
implements a numerical scheme for the resolution of the one-dimensional acoustic 
wave equation. They define an ad hoc notion of “uniform O” for functions of 
type R? — R, which we believe can in fact be viewed as an instance of our 
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generic definition of domination, at an appropriate product filter. Subsequent 
work on the Coquelicot library for real analysis [5] includes general definitions of 
filters, limits, little-o and asymptotic equivalence. A few definitions and lemmas 
in Coquelicot are identical to ours, but the focus in Coquelicot is on various 
filters on R, whereas we are more interested in filters on Z*. 

The tools RAML [17] and Pastis [8] perform fully automated amortized time 
complexity analysis of OCaml programs. They can be understood in terms of 
Separation Logic with Time Credits, under the constraint that the number of 
credits that exist at each program point must be expressed as a polynomial over 
the variables in scope at this point. The a priori unknown coefficients of this 
polynomial are determined by an LP solver. Pastis produces a proof certificate 
that can be checked by Coq, so the trusted computing base of this approach is 
about the same as ours. RAML and Pastis offer much stronger automation than 
our approach, but have weaker expressive power. It would be very interesting to 
offer access to a Pastis-like automated system within our interactive system. 
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Abstract. Multiplicative Weights (MW) is a simple yet powerful algo- 
rithm for learning linear classifiers, for ensemble learning à la boosting, 
for approximately solving linear and semidefinite systems, for comput- 
ing approximate solutions to multicommodity flow problems, and for 
online convex optimization, among other applications. Recent work in 
algorithmic game theory, which applies a computational perspective to 
the design and analysis of systems with mutually competitive actors, has 
shown that no-regret algorithms like MW naturally drive games toward 
approximate Coarse Correlated Equilibria (CCEs), and that for certain 
games, approximate CCEs have bounded cost with respect to the opti- 
mal states of such systems. 

In this paper, we put such results to practice by building distributed 
systems such as routers and load balancers with performance and conver- 
gence guarantees mechanically verified in Coq. The main contributions 
on which our results rest are (1) the first mechanically verified implemen- 
tation of Multiplicative Weights (specifically, we show that our MW is 
no regret) and (2) a language-based formulation, in the form of a DSL, of 
the class of games satisfying Roughgarden smoothness, a broad charac- 
terization of those games whose approximate CCEs have cost bounded 
with respect to optimal. Composing (1) with (2) within Coq yields a 
new strategy for building distributed systems with mechanically veri- 
fied complexity guarantees on the time to convergence to near-optimal 
system configurations. 


Keywords: Multiplicative weights - Algorithmic game theory 
Smooth games - Interactive theorem proving - Coq 


1 Introduction 


The Multiplicative Weights algorithm (MW, [1,25]) solves the general problem of 
“combining expert advice”, in which an agent repeatedly chooses which action, 
or “expert”, to play against an adaptive environment. The agent, after playing 
an action, learns from the environment both the cost of that action and of other 
actions it could have played in that round. The environment, in turn, may adapt 
© The Author(s) 2018 
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in order to minimize environment costs. MW works by maintaining a weighted 
distribution over the action space, in which each action initially has equal weight, 
and by updating weights with a linear or exponential loss function to penalize 
poorly performing actions. 

MW is a no-regret algorithm: its expected cost approaches that of the best 
fixed action the agent could have chosen in hindsight (i.e., external regret tends to 
zero) as time t — oo. Moreover, this simple algorithm performs remarkably well: 
in number of rounds logarithmic in the size of the action space, MW’s expected 
regret can be bounded by a small constant e (MW has bounded external regret). 
In [1], Arora, Hazan, and Kale showed that MW has wide-ranging connections 
to numerous problems in computer science, including optimization, linear and 
semidefinite programming, and machine learning (cf. boosting [14]). 

Our work targets another important application of MW: the approximate 
solution of multi-agent games, especially as such games relate to the construc- 
tion of distributed systems. It is well known (cf. [30, Chapter 4]) that no-regret 
algorithms such as MW converge, when played by multiple independent agents, 
to a large equilibrium class known as Coarse Correlated Equilibria (CCEs). CCEs 
may not be socially optimal, but for some games, such as Roughgarden’s smooth 
games [35], the social cost of such equilibrium states can be bounded by a con- 
stant factor of the optimal cost of the game (the game has bounded Price of 
Anarchy, or POA). Therefore, to drive the social cost of a smooth game to near 
optimal, it suffices simply to let each agent play a no-regret algorithm such 
as MW. 

Moreover, a number of distributed systems can be encoded as games, espe- 
cially when the task being distributed is viewed as an optimization problem. 
Consider, for example, distributed balancing of network flows over a set of web 
servers, an application we return to in Sect. 3. Assuming the set of flows is fixed, 
and that the cost of (or latency incurred by) assigning a flow to a particular web 
server increases as a function of the number of flows already assigned to that 
server (the traffic), then the load balancing application is encodable as a game 
in which each flow is a “player” attempting to optimize its cost (latency). An 
optimal solution of this game minimizes the total latency across all flows. Since 
the game is Roughgarden smooth (assuming affine cost functions), the social 
cost of its CCEs as induced by letting each player independently run MW is 
bounded with respect to that of an optimal solution. 


1.1 Contributions 


In this paper, we put such results to work by building the first verified implemen- 
tation of the MW algorithm — which we use to drive all games to approximate 
CCEs - and by defining a language-based characterization of a subclass of games 
called Roughgarden smooth games that have robust Price of Anarchy guarantees 
extending even to approximate CCEs. Combining our verified MW with smooth 
games, we construct distributed systems for applications such as routing and 
load balancing that have verified convergence and correctness guarantees. 
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Specifically, our main contributions are: 


— a new architecture, as embodied in the CAGE system (https://github.com/ 
gstew5/cage), for the construction of distributed systems with verified com- 
plexity guarantees, by composition of verified Multiplicative Weights (MW) 
with robust Price of Anarchy bounds via Roughgarden smoothness; 

— the first formally verified implementation of the MW algorithm; 

— a language-based characterization of Roughgarden smooth games, in the 
form of a mechanized DSL for the construction of such games together with 
smoothness preservation theorems showing that each combinator in the lan- 
guage preserves smoothness; 

— the application of the resulting system to distributed routing and load bal- 
ancing. 


By verified, we mean our MW implementation has mechanically checked con- 
vergence bounds and proof of correctness within an interactive theorem prover 
(specifically, Ssreflect [16], an extension of the Coq [5] system). By convergence 
and correctness, we mean that we prove both that MW produces the right answer 
(functional correctness with respect to a high-level functional specification), but 
also that it does so with external regret! bounded by a function of the number 
of iterations of the protocol (convergence). Convergence of MW in turn implies 
convergence to an approximate CCE. By composing this second convergence 
property with Roughgarden smoothness, we bound the social, or total, cost of 
the resulting system state with respect to the optimal. 

As we’ve mentioned, MW has broad application across a number of subdis- 
ciplines of computer science, including linear programming, optimization, and 
machine learning. Although our focus in this paper is the use of MW to imple- 
ment no-regret dynamics, a general strategy for computing the CCEs of multi- 
agent games, our implementation of MW (Sect. 5.3) could be used to build, e.g., 
a verified LP solver or verified implementation of boosting as well. 


Limitations. The approach we outline above does not apply to all distributed 
systems, nor even to all distributed systems encodable as games. In particular, in 
order to prove POA guarantees in our approach, the game encoding a particular 
distributed system must first be shown Roughgarden smooth, a condition which 
does not always apply (e.g., to network formation games [35, Section 2]). More 
positively, the Smooth Games DSL we present in Sects.3 and 4 provides one 
method by which to explore the combinatorial nature of Roughgarden smooth- 
ness, as we demonstrate with some examples in Sect. 3. 


Relationship to Prior Work. Some of the ideas we present in this paper pre- 
viously appeared in summary form in a 3-page brief announcement at PODC 
2017 [4]. The current paper significantly expands on the architecture of the CAGE 
system, our verified implementation of Multiplicative Weights, the definition of 
the Smooth Games DSL, and the composition theorems of Sect.6 proving that 
the pieces fit together to imply system-wide convergence and quality bounds. 


1 The expected (per-step) cost of the algorithm minus that of the best fixed action. 
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1.2 Organization 


The following section provides background on games, algorithmic game the- 
ory, and smoothness. Section 3 presents an overview of the main components of 
the CAGE approach, via application to examples. Section 4 provides more detail 
on the combinators of our Smooth Games DSL. Section 5 presents our verified 
implementation of MW. Section 6 describes the composition theorems proving 
that multi-agent MW converges to near-optimal e-CCEs. Sections 7 and 8 present 
related work and conclude. 


2 Background 


2.1 Games 


Von Neumann, Morgenstern, and Nash [28,29] (in the US) and Bachelier, Borel, 
and Zermelo [3,8,43] (in Europe) were the first to study the mathematical theory 
of strategic interaction, modern game theory. Nash’s famous result [27] showed 
that in all finite games, mixed-strategy equilibria (those in which players are 
allowed to randomize) always exist. Since the 1950s, game theory has had huge 
influence in numerous fields, especially economics. 

In our context, a game is a tuple of a finite type A (the strategy space) and 
a cost function C; mapping tuples of strategies of type A, x Ag x... x Aw to 
values of type R, the cost to player i of state (a1,...,@;,...,a@n). For readers 
interested in formalization-related aspects, Listing 1 provides additional details. 


Listing 1: Games in Ssreflect-Coq 


In SSREFLECT-Coq, an extension of the standard Coq system, a finite type 
A : finType pairs the type A with an enumerator enum: list A such that 
for all a: A, count a enum = 1 (every element is included exactly once). To 
define games, we use operational type classes [38], which facilitate parameter 
sharing: 
Class game (A : finType) (N : nat) (R : realFieldType) 
‘(costClass : CostClass N R A) : Type £ {}. 


costClass declares the cost function C;, and N is the number of players. 


A state s: Ay x Ag x... x Ay is a Pure Nash Equilibrium (PNE) when no 
player i € [1, N] has incentive to change its strategy: Vs}. Ci(s) < Ci(s/, s_;). 
Here s; is an arbitrary strategy. Strategy s; is player i’s move in state s. By 
s, 8_; we denote the state in which player i’s strategy is s; and all other players 
play s. In other words, no player can decrease its cost by unilateral deviation. 

Pure-strategy Nash equilibria do not always exist. Mixed Nash Equilibria 
(MNE), which do exist in all finite games, permit players to randomize over 
the strategy space, by playing a distribution g; over A. The overall state is the 
product distribution over the player distributions. Every PNE is trivially an 
MNE, by letting players choose deterministic distributions o;. 
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Correlated Equilibria (CEs) generalize MNEs to situations in which players 
coordinate via a trusted third party. In what follows, we’ll mostly be interested 
in a generalization of CEs, called Coarse Correlated Equilibria (CCEs), and 
their approximate relaxations. Specifically, a distribution a over AN (Listing 
2) is a CCE when ViVs'. Es.o[Ci(s)] < Esxo[Ci(si, s-:)]. Esvo[Ci(s)] is the 
expected cost to player i in distribution ø. The CCE condition states that there 
is no s; that could decrease player i’s expected cost. CCEs are essentially a 
relaxation of MNEs which do not require ø to be a product distribution (i.e., 
the players’ strategies may be correlated). CEs are a subclass of CCEs in which 
i swo|Ci(s;, $_i)] may be conditioned on s;. 

A distribution o over states may only be approximately a CCE. Define as €- 
approximate those CCEs ø for which ViVs'’. E,.g[Ci(s)] < Es.o[Ci(si, s_i)] + €. 
Moving to s; can decrease player i’s expected cost, but only by at most e. 


Listing 2: Discrete Distributions in Ssreflect-Coq 


Since our games A are finite, discrete distributions suffice to formalize 
MNEs, CEs, and CCEs. We model such distributions as finite functions 
(those with finite domain) from the strategy space A to R: 


Record dist (A : finType) : Type = 
mkDist { pmf :> {ffun A — R}; dist_ax : dist_axiom pmf }. 


Here {ffun A — R} is Ssreflect syntax for the type of finite functions from A 
to R. The second projection of the record, dist_ax, asserts that pmf represents 
a valid distribution: pmf is positive and }>., pmf a = 1. 

The Coq predicate eCCE: 


Definition eCCE (e : R) (a : dist AN) : Prop ê 
V(i : [0..N — 1]) (s : A), 
expectedCost i ø < (expectedUnilateralCost i o s’) + €. 


states that distribution ø (over N-tuples of strategies A, one per player) is 
an €-approximate CCE, or e-CCE. 


2.2 Algorithmic Game Theory 


Equilibria are only useful if we’re able to quantify, with respect to the game 
being analyzed: 


1. How good equilibrium states are with respect to the optimal configurations 
of a game. By optimal, we usually mean states s* that optimize the social 
cost: Vs. X; Ci(s*) < X; Cils). 

2. How “easy” (read computationally tractable) it is to drive competing players 
of the game toward an equilibrium state. 


Algorithmic game theory and the related fields of mechanism design and dis- 
tributed optimization provide excellent tools here. 
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Good Equilibria. The Price of Anarchy, or POA, of game (A,C) quantifies the 
cost of equilibrium states of (A,C) with respect to optimal configurations. Pre- 
cisely, define POA as the ratio of the social cost of the worst equilibrium s to 
the social cost of an optimal state s*. POA near 1 indicates high-quality equi- 
libria: finding an equilibrium in such a game leads to overall social cost close 
to optimal. Prior work in algorithmic game theory has established nontrivial 
POA bounds for a number of game classes: on various classes of congestion and 
routing games [2,6,10], on facility location games [40], and others [11,32]. 

In the system of Sect.3, we use the related concept of Roughgarden smooth 
games [35], or simply smooth games, which define a subclass of games with 
canonical POA proofs. To each smooth game are associated two constants, À 
and u. The precise definition of the smoothness condition is less relevant here 
than its consequences: if a cost-minimization game is (A, js)-smooth, then it has 
POA å/(1— u). Not all games are smooth, but for those that are, the POA bound 
above extends even to CCEs and their approximations, a particularly large (and 
therefore tractable) class of equilibria [35, Sects. 3 and 4]. 


Tractable Dynamics. Good equilibrium bounds are most useful when we know 
how quickly a particular game converges to equilibrium [7,9,12,13,17]. Certain 
classes of games, e.g. potential games [26], reach equilibria under a simple model 
of dynamics called best response. As we’ve mentioned, we use a different dis- 
tributed learning algorithm in this work, variously called Multiplicative Weights 
(MW) [1] or sometimes Randomized Weighted Majority [25], which drives all 
games to CCEs, a larger class of equilibrium states than those achieved by poten- 
tial games under best response. 


3 Cage by Example 


No-regret algorithms such as MW can be used to drive multi-agent sys- 
tems toward the eCCEs of arbitrary games. Although the CCEs of general 
games may have high social cost, those of smooth games, as identified by 
Roughgarden [35], have robust Price of Anarchy (POA) bounds that extend 
even to e«-CCEs. Figurel depicts how these pieces fit together in the high- 
level architecture of our CAGE system, which formalizes the results of Sect. 2 
in Coq. Shaded boxes are program-related components while white boxes are 
proof related. 


3.1 Overview 


At the top, we have a domain-specific language in Coq (DSL, box 1) that gener- 
ates games with automatically verified POA bounds. To execute such games, we 
have verified (also in Coq) an implementation of the Multiplicative Weights algo- 
rithm (MW, 2). Correctness of MW implies convergence bounds on the games it 
executes: O((In |A|)/e?) iterations suffice to drive the game to an e-CCE (here, 
|A| is the size of the action space, or game type, A). 
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Fig. 1. System architecture 


We compose N instances of multiplicative weights (4), one per agent, with a 
server (3) that facilitates communication, implemented in OCaml and modeled 
by an operational semantics in Coq. To actually execute games, we use Coq’s 
code extraction mechanism to generate OCaml code that runs clients against 
the server, using an unverified OCaml shim to send and receive messages. We 
prove performance guarantees in Coq from POA bounds on the game and from 
the regret bound on MW. 


3.2 Smooth Games DSL 


The combinators exposed by the Smooth Games DSL operate over game types 
A, cost functions C, and smoothness parameters À and u. Basic combinators in 
this language include (i) Resource and (ii) Unit games, the first for coordinating 
access to shared resources under congestion and the second with fixed cost 0. 
Combinators that take other games as arguments include: 


— the bias combinator Bias(A, b), which adds the fixed value b to the cost func- 
tion associated with game A; 

— the scalar combinator Scalar(A,m), which multiplies the output of the cost 
function C associated with game A by a fixed value m; 

— the product combinator A x B, corresponding to the parallel composition of 
two games A and B with cost equal to the sum of the costs in the two games; 

— the subtype game {x : A, P(x)}, which constructs a new game over the 
dependent sum type Xx : A.P(x) (values x satisfying the predicate P); 
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— the singleton game Singleton(A), which has cost 1 if if player i “uses” the 

underlying resource (Bresource(f i) = true), and 0 otherwise. The function 
B_(—) generalizes the notion of resource usage beyond the primitive Resource 
game. For example, Bsealar(4,m) (£) = Ba(a): usage in a game built from the 
scalar combinator reduces to usage in the underlying game. 


3.3 Example: Distributed Routing 


We illustrate the Smooth Games DSL with an example: distributed routing over 
networks with affine latency functions (Fig. 2). This game is known to have POA 
5/2 [35]. 

In a simple version of the game, N routing agents each choose a path from 
a global source vertex s to a global sink vertex t. Latency over edge e, modeled 
by an affine cost function ce(x) = aex + be, scales in the amount of traffic x over 
that edge. An optimal solution minimizes the total cost to all agents. 

We model each link in the network as a 
Resource game, which in its most basic form is AG) =* aanus 
defined by the following inductive datatype: (ten Agent 


+. 


f ra =10x+10 4 
Inductive Resource : Type £ ° LOTI ° 
| RYes : Resource ee 
| RNo : Resource. Agent, LG) => 
RYes indicates the agent chose to use the resource Fig.2. Routing game with 


(a particular edge) and RNo otherwise. The cost affine edge costs 
function for Resource is defined by: 


Definition ResourceCostFun (i : [0..N — 1]) (s : [0..N — 1] —rin Resource) : R = 
if s; is RYes then traffic s else 0. 


in which s is a map from agent labels to resource strategies and traffic s is the 
total number of agents that chose to use resource s. An agent pays traffic s if 
it uses the resource, otherwise 0. We implement Resource as a distinct inductive 
type, even though it’s isomorphic to bool, to ensure that types in the Smooth 
Games DSL have unique game instances. To give each resource the more inter- 
esting cost function c.(x) = aex + be, we compose Resource with a second com- 
binator, Affine(a., be, Resource), which has cost 0 if an agent does not use the 
resource, and cost a,*(traffic s)+ be otherwise. This combinator preserves (A, {4)- 
smoothness assuming A+ ys > 1, a side condition which holds for Resource games. 

We encode m affine resources by applying Affine to Resource m times, then 
folding under product: 


T = Affine(a;,b1,Resource) 
x Affine(az,b2,Resource) 
x... 

x Affine(am,bm,Resource) 


The associated cost function is the sum of the individual resource cost functions. 
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Values of type T may assign RYes to a subset of resources that doesn’t corre- 
spond to a valid path in a graph G = (V, E). To prevent this behavior, we apply 
to T the subtype combinator X, specialized to a predicate isValidPath(G,, s, t) 
enforcing that strategies (r1, r2,..., r] El) correspond to valid paths from s to 
t: T £ XisValidPath(G,s,t)(T). The game T’ is (5/3, 1/3)-smooth, just like the 
underlying Resource game, which implies POA of (5/3)/(1 — 1/3) = 5/2. 


3.4 Example: Load Balancing 


As a second example, consider the load balanc- 
ing game depicted in Fig. 3, in which a number of 
network flows are distributed over several servers 
with affine cost functions. In general, N load bal- 
ancing agents are responsible for distributing M 
flows over K servers. The cost of allocating a flow 
to a server is modeled by an affine cost function 
which scales in the total load (number of flows) 
on that server. Like routing, the load balancing 
x game has POA 5/2. This is no coincidence; both 
are special cases of “finite congestion games”, a 
class of games which have POA 5/2 when costs 
are linear [10]. The connection between them can be seen more concretely by 
observing that they are built up from the same primitive Resource game. 

We model the system as an N M-player K-resource game in which each player 
corresponds to a single network flow. Each load balancing agent poses as multiple 
players (MW instances) in the game, one per flow, and composes the actions 
chosen by these players to form its overall strategy. The result of running the 
game is an approximate CCE with respect to the distribution of flows over 
servers. 

Each server is defined as a Resource with an affine cost function, using 
the same data type and cost function as in the routing example. Instead of 
isValidPath, we use a new predicate exactlyOne to ensure that each network flow 
is assigned to exactly one server. 


Fig. 3. Load balancing game 


4 Smooth Games 


Roughgarden smoothness [35] characterizes a subclass of games with canoni- 
cal Price of Anarchy (POA) proofs. In [35], Roughgarden showed that smooth 
games have canonical POA bounds not only with respect to pure Nash equilibria 
but also with respect to mixed Nash equilibria, correlated equilibra, CCEs, and 
their approximate relaxations. In the context of CAGE, we use smoothness to 
bound the social cost of games executed by multiple clients each running MW. 
We show how the technical pieces fit together, in the form of bounds on an 
operational semantics of the entire CAGE system, in Sect.6. This section intro- 
duces the technical definition of smoothness and the language of combinators, 
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Syntax 


Scalars m,b; Predicates P 
Game types A, B ::= Resource | Unit | Bias(A, b) | Scalar(A, m) 
| Ax B|{a:A, P(x)} | Singleton(A) 


Judgment | ,,) (A,C) | read “Game (A, C) is (A, j4)-smooth.” 


ResourceSmooth 
5,1) (Resource, ResourceCostFun) 
3°3 


a te ee a 


Fp) (A, C) 
(1,0) (Singleton(A), fun i f. if Ba(f i) then 1 else 0) 


SingletonSmooth 


Fan) (A, C) 
Fan) ({x : A, P(x)}, fun i f. Ci (fun j. (F j)-1)) 
Fan) (AC) 1x s+ 0<b 
K(,u) (Bias(A, b), fun i f. Ci f +b) 


SigmaSmooth 


BiasSmooth 


F(x.) (A, C) 0 < m 
Feau) (Scalar(A, m), fun i f. m * Ci f) 


ScalarSmooth 


Fama) A C4) Fp HB) (B,C?) 
M (max(à4,àpg),max(un a uB)) (A x B, fun i f: ce f I o? f) 


ProductSmooth 


Fig. 4. Smooth games DSL 


or Smooth Games DSL of Sect.3, that we use to build games that are smooth 
by construction. 


Definition 1 (Smoothness). A game (A,C) is (A, )-smooth if for any two 
states s,s*: AN, the following inequality holds: 


k 
5 Cilsž, s-i) < A-C(s*) + u- C(s). 


i=l 


Here, C;(s*, s_;) denotes the individual cost to player i in the mixed state where 
all other players follow their strategies from s, while player i follows the corre- 
sponding strategy from s*. Smooth games bound the individual cost of players’ 
unilateral deviations from state s to s* by the weighted social costs of s and s*. 
In essence, when A and y are small, the effect of any single player’s deviation 
from a given state has minimal effect. 

The smoothness inequality leads to natural proofs of POA for a variety of 
equilibrium classes. As an example, consider the following bound on the expected 
cost of e-CCEs of (A, u)-smooth games: 
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Lemma smooth-eCCE (d : dist (state N T)) (s’ : state N T) (e: R): 
eCCE e d — optimal s’ > 
ExpectedCost d < Ax(Cost s’) + u*(ExpectedCost d) + Nxe. 


ExpectedCost d is the sum for all players i of the expected cost to player i of 
distribution d. N is the number of players in the game. 

The smooth-eCCE bound implies the following Price of Anarchy bound on 
the expected cost, summed across all players, of distribution d: 


Lemma smooth_POA e (d : dist (state N T)) s’ 
eCCE e d — optimal s’ — 
ExpectedCost d < A/(1 — u)*(Cost s’) + (Nxe)/(1 — u). 


If d is an &CCEẸ, then its cost is no more than A/(1 - u) times the optimal 
cost of s’, plus an additional term that scales in the number of players N. For 
example, for concrete values A = 5/3, u = 1/3, € = 0.0375, and N = 5, we get 
multiplicative approximation factor A/(1 — u) = 5/2 and additive factor 0.28. A 
value of € = 0.0375 is reasonable; as Sect. 5 will show, it takes fewer than 20, 000 
iterations of the Multiplicative Weights algorithm, in a game with strategy space 
of size 1000, to produce e < 0.0375. 


4.1 Combinators 


Figure 4 lists the syntax and combinators of the Smooth Games DSL we used 
in Sect. 3 to build smooth routing and load balancing games. 

The smoothness proof accompanying the judgment of Resource games is 
the least intuitive, and provides some insight into the behavior of smooth 
games. The structure of our proof borrows from a stronger result given by 
Roughgarden [35]: smoothness for resource games with affine cost functions 
and multiple resources. The key step is the following inequality first noted by 
Christodoulou and Koutsoupias [10]: 


y(z+1)< 


RE 


3 


for non-negative integers y and z. We derive (3, 3)-smoothness of Resource games 
from the following inequalities: 


2 Cilsž,s—i) < (traffic s*) - (traffic s + 1) (1) 
o, l 5 see cil ae 
(traffic s*) - (traffic s + 1) < 37 (traffic s*)° + z` (traffic s) (2) 
1 
(traffic s*) - (traffic s + 1) < > -C(s*) + oe C(s) (3) 
= 5 1 
Cilet, si) < Š -O(s") + Žu: ls) (4) 
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The inequality in step 1 is due to the fact that the cost per player in state s* 
is at most traffic s + 1, and there are exactly traffic s* players incurring such 
cost. I.e., (traffic s*) - (traffic s + 1) is the number of nonzero terms times the 
upper bound on each term. The substitution in step 3 comes from the fact that 
in any state s, C(s) = (traffic s)?; each of the m players using the resource incur 
cost m. 

The proofs of smoothness for other combinators are straightforward. For 
example, since Unit games always have cost 0, all values of ÀA and pu satisfy 
the smoothness inequality: 0 < A-0 + u-0. We restrict the range of the cost 
function in SingletonSmooth games to {0,1} by applying the function B,(-), 
which generalizes the notion of “using a resource” to all the game types of 
Fig. 4. Smoothness of the Singleton game follows by case analysis on the results 
of B4 (-) in the states s and s* of the smoothness inequality. The games produced 
by the SigmaSmooth combinator have costs equal to those of the underlying 
games but restrict the domain to those states satisfying a predicate P. Since 
smoothness of the underlying bound holds for all states in A, the same bound 
holds of the restricted domain of states a € A drawn from P. Smoothness of 
product games relies on the fact that smoothness still holds if A and u are 
replaced with larger values. Thus, each of the argument games to ProductSmooth 
is (max(A.4, AB), max(u4, uB))-smooth. The overall product game, which sums 
the costs of its argument games, is (max(\,4, àg), max( u4, 48) )-smooth as well. 

It’s possible to derive combinators from those defined in Fig. 4. For example, 
define as Affine(m, b, A) the game with cost function mz +b. We implement this 
game as {p: Scalar(m, A) x Scalar(b, Singleton(A)), p.1 = p.2}, or the subset of 
product games over the scalar game Scalar(m, A) and the {0,1} scalar game over 
b such that the first and second projections of each strategy p are equal. 


5 Multiplicative Weights (MW) 


At the heart of the CAGE architecture of Sect. 3 lies our verified implementation 
of the Multiplicative Weights algorithm. In this section, we present the details of 
the algorithm and sketch its convergence proof. Section 5.3 presents our verified 
MW implementation and mechanized proof of convergence. 


For all a € A, client initializes wı (a) = 1. 
For time t € [1...T]: 
Client Environment 
Let T; = Yaca wla). 
Play strategy p:(a) = w:(a)/T:. 
Choose cost vector ct. 
Update weights w:+ı (a) & we(a) x (1 — n * cx(a)) 


Fig. 5. Multiplicative Weights (MW) 
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5.1 The Algorithm 


The MW algorithm (Fig. 5) pits a client, or agent, against an adaptive envi- 
ronment. The agent maintains a weight distribution w over the action space, 
initialized to give each action equal weight. At each time step t € [1...T], the 
agent commits to the distribution w+/ J` „c4 Wt(a), communicating this mixed 
strategy to the environment. After receiving a cost vector c; from the environ- 
ment, the agent updates its weights w:+1 to penalize high-cost actions, at a rate 
determined by a learning constant 7 € (0,1/2]. High 7 close to 1/2 leads to 
higher penalties, and thus relatively less exploration of the action space. 

The environment is typically adaptive, and may be implemented by a number 
of other agents also running instances of MW. The algorithm proceeds for a fixed 
number of epochs T, or until some bound on expected external regret (expected 
cost minus the cost of the best fixed action) is achieved. In what follows, we 
always assume that costs lie in the range [—1,1]. Costs in an arbitrary but 
bounded range are also possible (with a concomitant relaxation of the algorithm’s 
regret bounds), as are variations of MW to solve payoff maximization instead of 
cost minimization. 


5.2 MW Is No Regret 


The MW algorithm converges reasonably quickly: To achieve expected regret at 
most €, it’s sufficient to run the algorithm O((In |A|)/e?) iterations, where |A| 
is the size of the action space [36, Chapter 17]. Regret can be driven arbitrarily 
small as the number of iterations approaches infinity. Bounded regret suffices to 
prove convergence to an approximate CCE, as [36] also shows. 

In this section, we present a high-level sketch of the proof that MW is no 
regret. We follow [36, Chapter 17], which has additional details. At the level 
of the mathematics, our formal proof makes no significant departures from 
Roughgarden. 


Definition 2 (Per-Step External Regret). Let a* be the best fixed action in 
hindsight (i.e., the action with minimum cost given the cost vectors received from 
the environment) and let OPT = ee c:(a*). The expected per-step external 


regret of MW is 
$ 
(>: G= orr) E 
t=1 


The summed term defines the cumulative expected cost of the algorithm for time 
te[1...T], where by ¢; we denote the expected cost at time t: 


G= pla) cla) = F ZO . cfa) 


acA acA 
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To get per-step expected regret, we subtract the cumulative cost of a* and divide 
by the number of time steps T. 


Theorem 1 (MW Has Bounded Regret). The algorithm of Fig.5 has 
expected per-step external regret at most n + ln |A| / nT. 


Proof Sketch. The proof of Theorem 1 uses a potential-function argument, with 
potential , equal the sum of the weights I; = $ ac a Wha) at time t. It proceeds 
by relating the cumulative expected cost >>, ¢ of the algorithm to OPT, the 
cost of the best fixed action, through the intermediate quantity Ip, ,. 

The proof additionally relies on the following two facts derived from the 


Taylor expansion In(1 — x) = —« y 5 
In(1 — 2) < —2, os 
=x — x? < In(1 — x), z < 1/2 


By letting n = \/In |A| / T (cf. [36, Chapter 17]), it’s possible to restate the 
regret bound of Theorem 1 to the following arguably nicer bound: 


Corollary 1 (MW Is No Regret) 


T 
(Za = orr) /T <2v/in|A| /T 


Here, the number of iterations T must be large enough to ensure that 7 = 
Vin |A| / T < 1/2, thus ensuring that 7 € (0, 1/2]. 
5.3 MW Architecture 


Our implementation and proof High-Level Functional Specification 


of MW (Fig. 6) were designed to Definition update_weights (w:weights) (c:costs) E 
G : : weights := =y 

be extensible. At a high level, finfun (fun a:A => w a * (1 - eta*(c a))). 

the proof structure follows the 

program refinement methodol- Operational Semantics F 
: ; SO E : = 

ogy, in which a high level mathe Tees Pete et at E 

matical but inefficient specifica- Weights DSL | A ~77+=7747144=====+7=+===============+= 

tion of MW (High-Level Func- (MW DSL) Executable Interpreter a 


tional Specification) is gradually cicla =18'] 


made more efficient by a series of 
refinements to various features 
of the program (for example, by 
replacing an inefficient implementation of a key-value map with a more efficient 
balanced binary tree). 

For each such refinement, we prove that every behavior of the lower-level 
program is a possible behavior of the higher-level program it refines. Thus spec- 
ifications proved for all behaviors of the high-level program also apply to each 


Fig. 6. MW architecture 
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behavior at the low level. By behavior here, we mean the trace of action distri- 
butions output by MW as it interacts with, and receives cost vectors from, the 
environment. 

We factor the lower implementation layers (Medium and Low) into an inter- 
preter and operational semantics over a domain-specific language specialized to 
MW-style algorithms (MW DSL). The DSL defines commands for maintain- 
ing and updating weights tables as well as commands for interacting with the 
environment. We prove, for any DSL program c, that the interpretation of that 
program refines its behavior with respect to the small-step operational semantics 
(Medium). Our overall proof specializes this general refinement to an implemen- 
tation of MW as a command in the DSL, in order to relate that command’s 
interpreted behavior to the high-level functional specification. 


5.4 MW DSL 


The syntax and semantics of the MW DSL are given in Fig.7. The small-step 
operational semantics (F c,o = c’,o’) is parameterized by an environment ora- 
cle that defines functions for sending action distributions to the environment 
(oracle_send) and for receiving the resulting cost vectors (oracle_recv). The oracle 
will in general be implemented by other clients also running MW (Sect. 6) but is 
left abstract here to facilitate abstraction and reuse. The oracle is stateful (the 
type T, of oracle states, may be updated both by oracle_send and oracle_recv). 
Most of the operational semantics rules are straightforward. In the MW- 
STEP-WEIGHTS rule for updating the state’s weights table, we make use of 
an auxiliary expression evaluation function E_[—] (standard and therefore not 
shown in Fig.7). The only other interesting rules are those for send and recv, 
which call oracle_send and oracle_recv respectively. In the relation oracle_recv, the 
first two arguments are treated as inputs (the input oracle state of type T and 
the channel) while the second two are treated as outputs (the cost vector of type 
A — Q and the output oracle state). In the relation oracle_send, the first three 
arguments are inputs while only the last (the output oracle state) is an output. 


Multiplicative Weights. As an example of an MW DSL program, consider our 
implementation (Listing 1.1) of the high-level MW of Fig.5. To the right of 
each program line, we give comments describing the effect of each command. 
The program is itself divided into three functions:mult_weights_init, which ini- 
tializes the weights table to assign weight 1 to each action a in the action space 
A; mult_weights_body, which defines the body of the main loop of MW; and 
mult_weights, which simply composes mult_weights_init with mult_weights_body. 
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Listing 1.1. MW DSL Implementation of Multiplicative Weights 


Definition mult_weights_init (A : Type) + 
update (A a: A = 1); (* For all a € A, initialize wı (a) = 1. *) 
send. (* Commit to the uniform distribution over actions. *) 


Definition mult_weights_body (A : Type) + 
recv; (x Block until agent receives cost vector cz from environment. *) 
update (A a: A = weight a x (1 — 7 * cost a)); (x Update weights. *) 
send. (x Commit to distribution w;/Iy. *) 


Definition mult_weights (A : Type) (n : N.t) £ 
mult_weights_init A; (» Initialize weights and commit to initial mixed strategy. *) 
iter n (mult_weights_body A). (« Do n iterations of the MW main loop. *) 


The MW DSL contains commands and expressions that are specialized to 
MW-style applications. Consider the function mult_weights_body (line 5). It first 
receives a cost vector from the environment using the specialized recv command. 
At the level of the MW DSL, recv is somewhat abstract. The program does not 
specify, e.g., which network socket to use. Implementation details such as these 
are resolved by the MW interpreter, which we discuss below in Sect. 5.5. 

After recv, mult_weights_body implements an update to its weights table as 
defined by the command: update (Aa : A = weight a * (1 — ņ * cost a)). As an 
argument to the update, we embed a function from actions a € A to expressions 
that defines how the weight of each action a should change at this step (time t+ 
1). The expressions weight a and cost a refer to the weight and cost, respectively, 
of action a at time t. The anonymous function term is defined in SSREFLECT- 
Coa, the metalanguage in which the MW DSL is defined. 


5.5 Interpreter 
To run MW DSL programs, we wrote an executable interpreter in Coq with 
type: 

interp (c : com A) (s : cstate) : option cstate. 


The type cstate defines the state of the interpreter after each step, and in general 
maps quite closely to the type of states ø used in the MW DSL operational 
semantics. It is given by the record: 
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Syntax 
Binary operators ® := + | — | * 
Expressions e::=d | —e | weight a | cost a | n | e1 ® e2 
Commands c ::= skip | update (Aa : A > e) | c1;c2 | iter n c | recv | send 


Environment Oracle 


oracle_recv : T — oracle-chanty > (A — Q) — T — Prop 
oracle_send : T — dist A — oracle_chanty — T — Prop 


States o ê 
{ SCosts : A — Q; SCostsOk : Va. |SCosts a| < 1 Current cost vector 
; SPrevCosts : seq {c : A > Q | Va. |c a| < 1} Previous cost vectors 
; SWeights : A — Q Weights table 
; SWeightsOk : Va. 0 < SWeights a 
; SEta : Q; SEtaOk : 0 < SEta < 1/2 The 7 parameter 
; SOutputs : seq (dist A) Committed distributions 
; SChan : oracle_chanty 1/O channel 
; SOracleSt : T }. Environment/oracle state 


Operational Semantics 


a’ = o{SWeights £ \a: A > Eole[x — al]} 
H update (Ax : A > e), o = skip, o’ 


MW-STEP-WEIGHTS 


/ / 
F c1, 0 > c, 


F skip; c2,0 => c2, 0 F c1; c2, 0 > c4; C2, 0” 
l<n 
F iter loog>co F iter n c, o > citer (n — 1) c, o 


oracle_recv (SOracleSt o) (SChan øo) ct 
H recv, ø => skip, o{SCosts ê c; SPrevCosts £ SCosts ø :: SPrevCosts o; SOracleSt £ t} 
oracle_send (SOracleSt o) d ch t 
H send, o > skip, c{SOutputs Ê d :: SOutputs o; SChan £ ch; SOracleSt £ t} 


Fig. 7. MW DSL syntax and operational semantics, parameterized by an environment 
oracle defining the type T of environment states and the functions oracle_recv and 
oracle_send for interacting with the environment. The type A is that of states in the 
underlying game. 


Record cstate : Type = 
{ SCosts : M.t Q 
; SPrevCosts : list (M.t Q) 
; SWeights : M.t Q 
; SEta : Q 
; SOutputs : list (A — Q) 
; SChan : oracle_chanty 
; SOracleSt : T }. 


Current cost vector 
Previous cost vectors 
Weights table 

The 7 parameter 
Committed distributions 
1/O channel 
Environment/oracle state 
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At the level of cstates, we use efficient purely functional data structures such as 
AVL trees. For example, the type M.t Q denotes an AVL-tree map from actions 
A to rational numbers Q. In the small-step semantics state, by contrast, we 
model the weights table not as a balanced binary tree but as a SSREFLECT-COQ 
finite function, of type {ffun A — Q}, which directly maps actions of type A to 
values of type Q. 

To speed up computation on rationals, we use a dyadic representation q = 57, 
which facilitates fast multiplication. We do exact arithmetic on dyadic Q instead 
of floating point arithmetic to avoid floating-point precision error. Verification of 
floating-point error bounds is an interesting but orthogonal problem (cf. [31,34]). 

The field SOutputs in the cstate record, a list of functions mapping actions 
a € Ato their probabilities, stores the history of weights distributions generated 
by the interpreter as send commands are executed. To implement commands 
such as send and recv, we parameterize our MW interpreter by an environment 
oracle, just as we did the operational semantics. The operations implemented 
by the interpreter environment oracle are functional versions of the operational 
semantics oracle_send and oracle_recv: 


oracle_send’ : VA:Type, T — A —> oracle_chanty « T 
oracle_recv’ : VA:Type, T — oracle_chanty — list (AxQ) * T 


The oracle state type T is provided by the implementation of the oracle, as in 
the operational semantics. The command oracle_send’ takes a state of type T 
and a value of type A as arguments and returns a pair of a channel of type 
oracle_chanty (on which to listen for a response from the environment) and a 
new oracle state of type T. The command oracle_recv’ takes as arguments the 
oracle state and channel and returns a list of (a,q) pairs, representing a cost 
vector over actions, along with the new oracle state. 


5.6 Proof 


The top-level theorem proved of our high-level functional specification of MW is: 


Theorem perstep_weights_noregret : 
(expCostsR — OPTR)/T <7 + In size A / (7 *T). 


The expression expCostsR is the cumulative expected cost of MW on a sequence 
of cost vectors, or the sum, for each time t, of the expected cost of the MW 
algorithm at time t. OPTR is the cumulative cost over T rounds of the best 
fixed action. The number ņ (a dyadic rational required to lie in range (0, 1/2}) 
is the learning parameter provided to MW and In size_A is the natural log of 
the size of the action space A. T is the number of time steps. In contrast to the 
interpreter and semantics of Sect. 5.3 (where we do exact arithmetic on dyadics), 
for reasoning and specification at the level of the proof we use Coq’s real number 
library and real-valued functions such as square root and log. 

By choosing 7 to equal \/In size_A / T, Corollary 1 showed that it’s possi- 
ble to restate the right-hand side of the inequality in perstep_weights_noregret to 
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2 * sqrt (In size A / T), thus giving an arguably nicer bound. Since in our imple- 
mentation of MW we require that 7 be a dyadic rational, we cannot implement 
n = In sizeA / T directly (In size_A is irrational). We do, however, prove the 
following tight approximation for all values of 7 approaching ,/In size_A / T: 


Lemma perstep_weights_noregret’ : 

Vr: R. r 4-1 > n = (1+r)»(sqrt (In size-A / T)) > 
(expCostsR — OPTR)/T < 

(1+r)*(sqrt (In size_A / T)) + (sqrt (In size_A / T))/(1+r). 


In the statement of this lemma, the r term quantifies the error (how far 
n is from its optimal value sqrt (In size-A / T). We require that r # —1 to 
ensure that division by 1 + r is well-defined. The resulting bound approaches 
2 * sqrt (In size_A / T) as r approaches 0. 


High-Level Functional Specification. Our high-level functional specification of 
MW closely models the mathematical specification of MW given in Fig. 5. For 
example, the following four definitions: 


Definition weights : Type = {ffun A — Q}. 

Definition costs : Type £ {ffun A — Q}. 

Definition init_weights : weights = \(_: A) > 1. 

Definition update_weights (w:weights) (c:costs) : weights = 
Aa: AS>warx(l—7*C a). 


construct the types of weight (weights) and cost vectors (costs), represented 
as finite functions from A to Q; define the initial weight vector (init_weights), 
which maps all actions to cost 1; and define the MW weight update rule 
(update_weights). The recursive function: 


Fixpoint weights_of (cs : seq costs) (w : weights) : weights + 
if cs is c :: cs’ then update_weights (weights_of cs’ w) c else w. 


defines the vector that results from using update_weights to repeatedly update 
w with respect to cost vectors cs. 


Adaptive Vs. Oblivious Adversaries. In our high-level specification of MW, we 
parameterize functions like weights_of by a fixed sequence of cost vectors cs 
rather than model interaction with the environment, as is done in Fig.5. An 
execution of our low-level interpreted MW, even against an adaptive adversary, 
is always simulatable by the high-level functional specification by recording in 
the low-level execution the cost vectors produced by the adversary, as is done 
by the SPrevCosts field (Sect. 5.5), and then passing this sequence to weights_of. 
This strategy is quite similar to using backward induction to solve the MW game 
for an oblivious adversary. 


Connecting the Dots. To connect the MW interpreter to the high-level specifi- 
cation, we prove a series of refinement theorems (technically, backward simula- 
tions). As example, consider: 
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Lemma interp_step_plus : 
V(ao : A) (s : state A) (t t : cstate) (c : com A), 
interp c t = Some t > 
match_states s t — 

Ac’ s’, final_com c’ A 
((c =CSkip A s = s’) V step_plus ag c sc’ s") A 
match-states s’ t. 


which relates the behavior of the interpreter (interp c t) when run on an arbitrary 
command c in cstate t to our model of MW DSL commands as specified by the 
operational semantics. 

To prove that the operational semantics correctly refines our high-level func- 
tional specification of MW (and therefore satisfies the regret bounds given at 
the start of Sect. 5.6), we prove a similar series of refinements. Since backward 
simulations compose transitively, we prove regret bounds on our interpreted MW 
just by composing the refinements in series. The bounds we prove in this way 
are parametric in the environment oracle with which MW is instantiated. When 
the oracle state types differ from source to target in a particular simulation, as 
is the case in our proof that the MW DSL interpreter refines the operational 
semantics, we require that the oracles simulate as well. 


6 Coordinated MW 


A system of multiple agents each running MW yields an e-CCE of the underlying 
game. If the game being played is smooth — for example, it was built using the 
combinators of the Smooth Games DSL of Sect.4 — then the resulting «-CCE 
has bounded social cost with respect to a globally optimal strategy. In this 
section, we put these results together by (1) defining an operational semantics of 
distributed interaction among multiple clients each running MW, and (2) proving 
that distributed executions of this semantics yield near-optimal solutions, as long 
as the underlying game being played is smooth. 


6.1 Machine Semantics 


We model the evolution of the distributed machine by the operational seman- 
tics in Fig. 8. Client states (client-state) bundle commands from the MW DSL 
(Sect.5) with MW states parameterized by the ClientPkg oracle. The client ora- 
cle send and receive functions model single-element (pin) queues, represented as 
values of type option (dist A), storing values sent by an MW node, and of type 
option (A — Q), storing values received by an MW node. 

States of the coordinated machine (type machine_state N A) map client 
indices in range [0..N — 1] to client states (type client_state A). Machine states 
also record, at each iteration of the distributed MW protocol, the history of dis- 
tributions received from the clients in that round (type seq ([0..N—1] — dist A)), 
which will be used to prove Price of Anarchy bounds in the next section 
(Sect. 6.2). We say that all_clients_have_sent in a particular machine state m, 
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Client Oracle 


ClientPkg £ 
{ sent : option (dist A); 
received : option (A — Q); 
received_ok : Vu. received = Some v —> Va. 0 < va <1} 
client_oracle_recv A (p : ClientPkg) (— : unit) (v : A — Q) (p’ : ClientPkg) ê 
p.received = Some v ^ p'.received = None ^ p’.sent = p.sent 
client_oracle_send A (p : ClientPkg) (d : dist A) (— : unit) (p’ : ClientPkg) = 
p.sent = None ^ p'.sent = Some d A p'.received = p.received 


Machine States 


client_state A 3 ø Ê (com A x state A ClientPkg unit) 
machine_state N A > m ê 
{ clients : [0..N — 1] — client_state A; 
hist : seq ([0..N — 1] — dist A) } 
all_clients_have_sent A (m : machine state) (f : [0..N — 1] — dist A) ê 
Vi: [0..N — 1]. let (—, 0) = m-clients i in 
(SOracleSt o).received = None A (SOracleSt o).sent = Some fi. 


Machine Step | m => m’ 


cost vec Ai : A+Q £a. Xo 0..N-1]>Alpi=a) llojiz) Ji pj * Ci p 
m.clients i = (c, o) m' Glens i=(c,0') a ~o a’ 
(SOracleSt o).sent = None (SOracleSt o’).received = Some (cost_vec f i) 


server_sent_cost_vector i f m m’ 


m.clients i = (c, o) (SOracleSt o).sent = None Foosc,0 | 
i a : ClientStep 
H m => mf clients = m.clients[i +> (c’,o’)] } 
all_clients_have_sent m f 
(Vi. server_sent_cost_vector i f m m’) m’ hist = f :: m.hist 
; ServerStep 
F m= m 


Fig. 8. Semantics of the distributed machine 


committing to the set of distributions f, if each client’s received buffer is empty 
and its sent buffer contains the distribution f;, of type dist A. 

The machine step relation models a server—client protocol, distinguishing 
server steps (ServerStep) from client steps (ClientStep). Client steps, which run 
commands in the language of Fig. 7, may interleave arbitrarily. Server steps are 
synchronized by the all_clients_have_sent relation to run only after all clients have 
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completed the current round. The work done by the server is modeled by the 
auxiliary relation server_sent_cost_vector i f m m’, which constructs and sends to 
client 7 the cost vector derived from the set of client distributions f. The relation 
a ~o o’ states that o and o’ are equal up to their SOracleSt components. 

In the distributed MW setting, the cost to player i of a particular action a: A 
is defined as the expected value, over all N-player strategy vectors p in which 
player i chose action a (p; = a), of the cost to player i of p, with the expectation 
over the (N — 1)-size product distribution induced by the players j # i. 


6.2 Convergence and Optimality 


Our proof that MW is no regret (Sect.5) extends to system-wide convergence 
and optimality guarantees, with respect to the distributed execution model of 
Fig. 8 in which each client runs our MW implementation. The proof has three 
major steps: 


1. Show that no-regret clients implementing MW are still no regret when inter- 
leaved in the distributed semantics of Fig. 8. 

2. Prove that per-client regret bounds — one for each client running MW -— imply 
system-wide convergence to an e-CCE. 

3. Use POA results for smooth games from Sect.4 to bound the cost, with 
respect to that of an optimal state, of all such e-CCEs. 


Composing 1, 2, and 3 proves that the distributed machine of Fig.8 — when 
instantiated to clients running MW - converges to near-optimal solutions to 
smooth games. We briefly describe each part in turn. 


Part 1 : No-regret clients are still no regret when interleaved. That MW no-regret 
bounds lift to an MW client running in the context of the distributed operational 
semantics of Fig.8 follows from the oracular structure of our implementation 
of MW (Sect.5) — clients interact with other clients and with the server only 
through the oracle. 

In particular, for any execution F m =>" m’ of the machine of Fig.8, and 
for any client i, there is a corresponding execution of client i with respect to a 
small nondeterministic oracle that simply “guesses” which cost vector to supply 
every time the MW client executes a recv operation. Because MW is no regret 
for all possible sequences of cost vectors, proving a refinement against the non- 
deterministic oracle implies a regret bound on client 7’s execution from state mi 
to state mi. 

We lift this argument to all the clients running in the Fig.8 semantics by 
proving the following theorem: 


Theorem all_clients-bounded-_regret A m m’ T (e : rat) : 
hist m = nil > 0 < size (hist m’) — final-state m’ —> 
H m =t m > 
(Vi, m.clients 7 = (mult_weights A T, init_state A n tt (init_ClientPkg A))) > 
n + In size A/(j *T) < € > 
machine_regret_eps m’ e. 
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The predicate machine_regret_eps holds in state s’, against regret bound e, if all 
clients have expected regret in state s’ at most € (with respect to the or distri- 
bution we describe below), for any rational e larger than 7 + In size_A/(7 *T) 
(the regret bound we proved of MW in Sect. 5). 

We assume that the history is empty in the initial state (hist m = nil), and 
that at least one round was completed (0 < size (hist m’)). By final-state m’, we 
mean that all clients have synchronized with the server (by receiving a cost vector 
and sending a distribution) and then have terminated in CSkip. All clients in state 
m are initialized to execute T steps of MW over game A (mult_weights A T), from 
an initial state and initial ClientPkg. 


Part 2: System-wide convergence to an e-C'CE. The machine semantics of Fig. 8 
converges to an approximate Coarse Correlated Equilibrium (e€-CCE). 

More formally, consider an execution K m —>* m’ of the Fig.8 semantics 
that results in a state m’ for which machine_regret_eps m’ € (all clients have regret 
at most €, as established in Part I). The distribution ør, defined as the time- 
averaged history of the product of the distributions output by the MW clients 
at each round, is an e-CCE: 


To N j j 
ist []j=1 (hist m’); Pj 
T 


OT £ Ap. 


By (hist m’)? we mean the distribution associated to player j at time i, as 
recorded in the execution history stored in state m’. The value ((hist m’)? pj) is 
the probability that client j chose action p; in round i. 


We formalize this property in the following Coq theorem: 


Theorem machine_regret_eCCE m’ € : 
machine_regret_eps m’ € > 
eCCE e or. 


which states that or is an eCCE, with approximation factor €, as long as each 
client’s expected regret over oy is at most € (machine_regret_eps m’ €) — exactly 
the property we proved in Part 1 above. 


Part 3 System-wide regret bounds. The machine semantics of Fig. 8 converge to 
a state with expected cost bounded with respect to the optimal cost. 

Consider an execution of the Fig. 8 semantics F m =>* m’ and an e satisfying 
the conditions of all_clients_-bounded_regret. If the underlying game is smooth, 
the expected cost of the time-averaged distribution of the clients in m’, ør, is 
bounded with respect to the cost of an optimal strategy profile s’ by the following 
Coq theorem: 


Theorem systemwide-POA_bound A m m’ T (e : rat) s’: 
hist m = nil + F m =>" m' — 0 < size (hist m’) — final-state m’ > 
(Vi, m.clients 7 = (mult_weights A T, init_state A n tt (init_ClientPkg A))) > 
n + In size_A/(n*T) < € —> 
optimal s’ > 
ExpectedCost or < A/(1—u) * Cost s’ + (Nxe/(1—1)) 
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In the above theorem, À and u are the smoothness parameters of the game A 
while N is the number of players. Cost s’ is the social (total) cost of the optimal 
state s’. 


7 Related Work 


Reinforcement Learning, Bandits. There is extensive work on reinforcement 
learning [39], multi-agent reinforcement learning (MARL [19]), and multi-armed 
bandits (MAB, [15]), more than can be cited here. We note, however, that Q- 
learning [41], while similar in spirit to MW, addresses the more general scenario 
in which an agent’s action space is modeled by an arbitrary Markov Decision 
Process (in MW, the action space is a single set A). Our verified MW imple- 
mentation is most suitable, therefore, for use in the full-information analog of 
MAB problems, in which actions are associated with “arms” and each agent 
learns the cost of all arms — not just the one it pulled — at each time step. In 
this domain, MW has good convergence bounds, as we prove formally of our 
implementation in this paper. Relaxing our verified MW and formal proofs to 
the partial information Bandit setting is interesting future work. 


Verified Distributed Systems. EventML [33] is a domain-specific language for 
specifying distributed algorithms in the Logic of Events, which can be mechan- 
ically verified within the Nuprl proof assistant. Work has been done to develop 
methods for formally verifying distributed systems in Isabelle [20]. Model check- 
ing has been used extensively (e.g., [21,24]) to test distributed systems for bugs. 

Verdi [42] is a Coq framework for implementing verified distributed sys- 
tems. A Verdi system is implemented as a collection of handler functions which 
exchange messages through the network or communicate with the “outside 
world” via input and output. Application-level safety properties of the system 
can be proved with respect to a simple, idealized network semantics. A verified 
system transformer (VST) can then be used to transform the executable sys- 
tem into one which is robust to network faults such as reordering, duplication, 
and dropping of packets. The safety properties of the system proved under the 
original network semantics are preserved under the new faulty semantics, with 
minimal additional proof effort required of the programmer. 

The goals of Verdi are complementary to our own. We implement a veri- 
fied no-regret MW algorithm, together with a language of Roughgarden smooth 
games, for constructing distributed systems with verified convergence and cor- 
rectness guarantees. Verdi allows safety properties of a distributed system to 
be lifted to analogous systems which tolerate various network faults, and pro- 
vides a robust runtime system for execution in a practical setting. It stands to 
reason, then, that Verdi (as well as follow-on related work such as [37]) may pro- 
vide a natural avenue for building robust executable versions of our distributed 
applications. We leave this for future work. 

Chapar [23] is a Coq framework for verifying causal consistency of distributed 
key-value stores as well as correctness of client programs with respect to causally 
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consistent key-value stores. The implementation of a key-value store is proved 
correct with respect to a high-level specification using a program refinement 
method similar to ours. Although Chapar’s goal isn’t to verify robustness to 
network faults, node crashes and message losses are modeled by its abstract 
operational semantics. 

IronFleet [18] is a framework and methodology for building verified dis- 
tributed systems using a mix of TLA-style state machine refinement, Hoare 
logic, and automated theorem proving. An IronFleet system is comprised of 
three layers: a high-level state machine specification of the overall system, a 
more detailed distributed protocol layer which describes the behavior of each 
agent in the system as a state machine, and the implementation layer in which 
each agent is programmed using a variant of the Dafny [22] language extended 
with a trusted set of UDP networking operations. Correctness properties are 
proved with respect to the high-level specifications, and a series of refinements 
is used to prove that every behavior in the implementation layer is a refine- 
ment of some behavior in the high-level specification. IronFleet has been used to 
prove safety and liveness properties of IronRSL, a Paxos-based replicated state 
machine, as well as IronKV, a shared key-value store. 


Alternative Proofs. Variant proofs of Theorem 1, such as the one via KL- 
divergence (cf. [1, Section 2.2]), could be formalized in our framework without 
modifying most parts of the MW implementation. In particular, because we have 
proved once and for all that our interpreted MW refines a high-level specification 
of MW, it would be sufficient to formalize the new proof just with respect to the 
high-level program of Sect. 5.6. 


8 Conclusion 


This paper reports on the first formally verified implementation of Multiplica- 
tive Weights (MW), a simple yet powerful algorithm for approximately solving 
Coarse Correlated Equilibria, among many other applications. We prove our 
MW implementation correct via a series of program refinements with respect 
to a high-level implementation of the algorithm. We present a DSL for building 
smooth games and show how to compose MW with smoothness to build dis- 
tributed systems with verified Price of Anarchy bounds. Our implementation 
and proof are open source and available online. 
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Abstract. We present a novel program verification approach based on 
coinduction, which takes as input an operational semantics. No interme- 
diates like program logics or verification condition generators are needed. 
Specifications can be written using any state predicates. We implement 
our approach in Coq, giving a certifying language-independent verifi- 
cation framework. Our proof system is implemented as a single module 
imported unchanged into language-specific proofs. Automation is reached 
by instantiating a generic heuristic with language-specific tactics. Man- 
ual assistance is also smoothly allowed at points the automation can- 
not handle. We demonstrate the power and versatility of our approach 
by verifying algorithms as complicated as Schorr-Waite graph marking 
and instantiating our framework for object languages in several styles 
of semantics. Finally, we show that our coinductive approach subsumes 
reachability logic, a recent language-independent sound and (relatively) 
complete logic for program verification that has been instantiated with 
operational semantics of languages as complex as C, Java and JavaScript. 


1 Introduction 


Formal verification is a powerful technique for ensuring program correctness, but 
it requires a suitable verification framework for the target language. Standard 
approaches such as Hoare logic [1] (or verification condition generators) require 
significant effort to adapt and prove sound and relatively complete for a given 
language, with few or no theorems or tools that can be reused between languages. 
To use a software engineering metaphor, Hoare logic is a design pattern rather 
than a library. This becomes literal when we formalize it in a proof assistant. 

We present instead a single language-independent program verification frame- 
work, to be used with an executable semantics of the target programming lan- 
guage given as input. The core of our approach is a simple theorem which gives 
a coinduction principle for proving partial correctness. 

To trust a non-executable semantics of a desired language, an equivalence 
to an executable semantics is typically proved. Executable semantics of pro- 
gramming languages abound in the literature. Recently, executable semantics of 
several real languages have been proposed, e.g. of C [2], Java [3], JavaScript [4,5], 
Python [6], PHP [7], CAML [8], thanks to the development of executable seman- 
tics engineering frameworks like K [9], PLT-Redex [10], Ott [11], etc., which 
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make defining a formal semantics for a programming language almost as easy as 
implementing an interpreter, if not easier. Our coinductive program verification 
approach can be used with any of these executable semantics or frameworks, 
and is correct-by-construction: no additional “axiomatic semantics”, “program 
logic”, or “semantics suitable for verification” with soundness proofs needed. 

As detailed in Sect.6, we are not the first to propose a language-independent 
verification infrastructure that takes an operational semantics as input, nor the 
first to propose coinduction for proving isolated properties about some pro- 
grams. However, we believe that coinduction can offer a fresh, promising and 
general approach as a language-independent verification infrastructure, with a 
high potential for automation that has not been fully explored yet. In this paper 
we make two steps in this direction, by addressing the following research ques- 
tions: 


RQ1 Is it feasible to have a sound and (relatively) complete verification infras- 
tructure based on coinduction, which is language-independent and versa- 
tile, i.e., takes an arbitrary language as input, given by its operational 
semantics? 

RQ2 Is it possible to match, or even exceed, the capabilities of existing language- 
independent verification approaches based on operational semantics? 


To address RQ1, we make use of a key mathematical result, Theorem 1, which 
has been introduced in more general forms in the literature, e.g., in [12,13] and 
n [14]. We mechanized it in Coq in a way that allows us to instantiate it with 
a transition relation corresponding to any target language semantics, hereby 
producing certifying program verification for that language. Using the resulting 
coinduction principle to show that a program meets a specification produces a 
proof which depends only on the operational semantics. We demonstrate our 
proofs can be effectively automated, on examples including heap data structures 
and recursive functions, and describe the implemented proof strategy and how 
it can be reused across languages defined using a variety of operational styles. 

To address RQ2, we show that our coinductive approach not only subsumes 
reachability logic [15], whose practicality has been demonstrated with languages 
like C, Java, and JavaScript, but also offers several specific advantages. Reacha- 
bility logic consists of a sound and (relatively) complete proof system that takes 
a given language operational semantics as a theory and derives reachability prop- 
erties about programs in that language. A mechanical procedure can translate 
any proof using reachability logic into a proof using our coinductive approach. 

We first introduce our approach with a simple intuitive example, then prove 
its correctness. We then discuss mechanical verification experiments across dif- 
ferent languages, show how reachability logic proofs can be translated into coin- 
ductive proofs, and conclude with related and future work. Our entire Coq for- 
malization, proofs and experiments are available at [16]. 
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2 Overview and Basic Notions 


Section 4 will show the strengths of our approach by means of verifying rather 
complex programs. Here our objective is different, namely to illustrate it by ver- 
ifying a trivial IMP (C-style) program: s=0; while (--n) {s=stn;}. Let sum 
stand for the program and loop for its while loop. When run with a positive ini- 
tial value n of n, it sets s to the sum of 1,...,2—1. To illustrate non-termination, 
we assume unbounded integers, so loop runs forever for non-positive n. An IMP 
language syntax sufficient for this example and a possible execution trace are 
given in Fig.1. The exact step granularity is not critical for our approach, as 
long as diverging executions produce infinite traces. 


Pgm ::= Stmt (s=0; while (--n) {s=stn;}|n+ 4) 
(while (--n) {s=s+n;} |n > 4, s => 0) 
Exp = Id (if (--n) {s=stn; loop} else {skip} |n > 4, s > 0) 
Int (if (3) {s=stn; loop} else {skip} | n+ 3,s+ 0) 
== Id s=s+n; loop |n => 3,s+ 0) 
Exp op Exp s=0+n; loop |n+>3,s+ 0) 


(s=0+3; loop |n => 3, s = 0) 


Stmt := skip (s=3; loop | n+ 3, s+ 0) 
(skip; loop |n = 3, s = 3) 


Me Bop [ale Coa) e aeea 
if Exp { Stmt } (while (--n) {s=s+n;} |n > 1, s = 6) 
else { Stmt } (if (--n) {s=s+n; loop} else {skip} |n > 1, s > 6) 


(if (0) {s=s+n; loop} else {skip} |n > 0, s = 6) 


while Exp { Stmt } (skip| n 0s +8) 


Fig. 1. Syntax of IMP (left) and sample execution of sum (right) 


While our coinductive program verification approach is self-contained and 
thus can be presented without reliance on other verification approaches, we prefer 
to start by discussing the traditional Hoare logic approach, for two reasons. First, 
it will put our coinductive approach in context, showing also how it avoids some 
of the limitations of Hoare logic. Second, we highlight some of the subtleties of 
Hoare logic when related to operational semantics, which will help understand 
the reasons and motivations underlying our definitions and notations. 


2.1 Intuitive Hoare Logic Proof 


A Hoare logic specification/triple has the form {Ypre} code {ppost}. The conve- 
nience of this notation depends on specializing to a particular target language, 
such as allowing variable names to be used directly in predicates to stand for 
their values, or writing only the current statement. This hides details of the 
environment/state representation, and some framing conventions or composi- 
tionality assumptions over the unmentioned parts. A Hoare triple specifies a set 
of (partial correctness) reachability claims about a program’s behavior, and it is 
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(IMP statement rules) 


{yle/x]} x= e; Iv} (HL-ASGN) 
{vik si dyz}, {vel se {ys} 
{yi} si s2 {p3h} (HL-sEQ) 
{ya e # O} Si fy’, fo A e= O} So {dp} 
{iy} if (e) then {s:} else {s2} {dy} (HL-IF) 
tener Mets (HL-WHILE) 


{py} while (e) {s} Jp A e = 0} 
(Generic rule) 
Eve delsi, Fy >y 


Tohe (yh (HL-CONSEQ) 


Fig. 2. IMP program logic. 


typically an over-approximation (i.e., it specifies more reachability claims than 
desired or feasible). Specifically, assume some formal language semantics of IMP 
defining an execution step relation R C C x C on a set C of configurations 
of the form (code |ø}, like those in Fig. 1. We write a >p b for (a,b) € R. 
Section 2.3 (Fig. 3) discusses several operational semantics approaches we exper- 
imented with (Sect.4), that yield such step relations R. A (partial correctness) 
reachability claim (c, P), relating an initial state c € C and a target set of states 
P CC, is valid (or holds) iff the initial state c can either reach a state in P or can 
take an infinite number of steps (with >); we write c >p P to indicate that 
claim (c, P) is valid, and a — b or c > P instead of a >p b or c > Rp P, resp., 
when R is understood. Then {Ypre}code{ ppost} specifies the set of reachability 
claims 


{((code | apre), { (skip | apost) | opost E Ppost}) | Opre E Ppre} 


and it is valid iff all of its reachability claims are valid. It is necessary for P 
in reachability claims (c, P) specified by Hoare triples to be a set of configura- 
tions (and thus an over-approximation): it is generally impossible for Ypost to 
determine exactly the possible final configuration or configurations. 

While one can prove Hoare triples valid directly using the step relation >R 
and induction, or coinduction like we propose in this paper, the traditional app- 
roach is to define a language-specific proof system for deriving Hoare triples from 
other triples, also known as a Hoare logic, or program logic, for the target pro- 
gramming language. Figure 2 shows such a program logic for IMP. Hoare logics 
are generally not executable, so testing cannot show whether they match the 
intended semantics of the language. Even for a simple language like IMP, if one 
mistakenly writes e = 1 instead of e Æ 0 in rule (HL-WHILE), then one gets an 
incorrect program logic. When trusted verification is desired, the program logic 
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needs to be proved sound w.r.t. a reference executable semantics of the language, 
i.e, that each derivable Hoare triple is valid. This is a highly non-trivial task for 
complex languages (C, Java, JavaScript), in addition to defining a Hoare logic 
itself. Our coinductive approach completely avoids this difficulty by requiring no 
additional semantics of the programming language for verification purposes. 
The property to prove is that sum (or more specifically loop) exits only when 
n is 0, with s as the sum Sji (or aln=))), In more detail, any configuration 
whose statement begins with sum and whose store defines n as n can run indef- 
initely or reach a state where it has just left the loop with nr 0, sr pies i, 
and the store otherwise unchanged. As a Hoare logic triple, that specification is 


n-1 
fn = n} s=0; while (--n) {s=s+n;} {s= > i ^n=0} 
i=1 
As seen, this Hoare triple asserts the validity of the set of reachability claims 
S = { (cno; Pno) | Vn, Vo undefined in n} (1) 
where 
Cn,o = (s=0; while(--n){s=s+n;} |n n, o) 
Pao = {(skip|n m 0, s 0%) i, 0’) | Yo’ undefined in n, s} 


We added the o and o’ state frames above for the sake of complete details about 
what Hoare triples actually specify, and to illustrate why P in claims (c, P) 
needs to be a set. Since the addition/removal of o and o’ does not change the 
subsequent proofs, for the remainder of this section, for simplicity, we drop them. 

Now let us assume, without proof, that the proof system in Fig. 2 is sound 
(for the executable step relation —> p of IMP discussed above), and let us use it to 
derive a proof of the sum example. Note that the proof system in Fig. 2 assumes 
that expressions have no side effects and thus can be used unchanged in state 
formulae, which is customary in Hoare logics, so the program needs to be first 
translated out into an equivalent one without the problematic --n where expres- 
sions have no side effects. We could have had more Hoare logic rules instead of 
needing to translate the code segment, but this would quickly make our program 
logics significantly more complicated. Either way, with even a simple imperative 
programming language like we have here, it is necessary to either add Hoare 
logic rules to Fig. 2 or to modify our code segment. These inconveniences are 
taken for granted in Hoare logic based verifiers, and they require non-negligible 
additional effort if trusted verification is sought. For comparison, our coinductive 
verification approach proposed in this paper requires no transformation of the 
original program. After modifying the above problematic expression, our code 
segment gets translated to the (hopefully) equivalent code: 


s=0; n=n-1; while (n) {s=stn; n=n-1;} 


Let loop’ be the new loop and let Yiny, its invariant, be 


_ (n= 1) -2)(n +n) 
2 
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The program variable n stands for its current value, while the mathematical 
variable n stands for the initial (sometimes called “old”) value of n. Next, using 
the assign and sequence Hoare logic rules in Fig. 2, as well as basic arithmetic 
via the (HL-CONSEQ) rule, we derive 


{n = nf} s=0; n=n-1; {Yin} (2) 


Similarly, we can derive {ying An Æ 0} s=stn; n=n-1; {Yinv}. Then, applying 
the while rule, we derive {inv} loop’? {Yiny An = 0}. The rest follows by the 
sequence rule with the above, (2), and basic arithmetic. 

This example is not complicated, in fact it is very intuitive. However, it 
abstracts out a lot of details in order to make it easy for a human to understand. 
It is easy to see the potential difficulties that can arise in larger examples from 
needing to factor out the side effect, and from mixing both program variables 
and mathematical variables in Hoare logic specifications and proofs. With our 
coinduction verification framework, all of these issues are mitigated. 


2.2 Intuitive Coinduction Proof 


Since our coinductive approach is language-independent, we do not commit to 
any particular, language-specific formalism for specifying reachability claims, 
such as Hoare triples. Consequently, we will work directly with raw reachability 
claims/specifications S C C x P(C) consisting of sets of pairs (c, P) with c € C 
and P C C as seen above. We show how to coinductively prove the claim for 
the example sum program in the form given in (1), relying on nothing but a 
general language-independent coinductive machinery and the trusted execution 
step relation —>pr of IMP. Recall that we drop the state frames (ø) in (1). 

Intuitively, our approach consists of symbolic execution with the language 
step relation, plus coinductive reasoning for circular behaviors. Specifically, sup- 
pose that Scire C C x P(C) is a specification corresponding to some code with 
circular behavior, say some loop. Pairs (c, P) € Seire with c € P are already 
valid, that is, c =p P for those. “Execute” the other pairs (c, P) € Seire with 
the step relation >p, obtaining a new specification $” containing pairs of the 
form (d, P), where c >p d; since we usually have a mathematical description of 
the pairs in Seire and S”, this step has the feel of symbolic execution. Note that 
Scire is valid if S’ is valid. Do the same for S’ obtaining a new specification S”, 
and so on and so forth. If at any moment during this (symbolic) execution pro- 
cess we reach a specification S$ that is included in our original Seirc, then simply 
assume that S is valid. While this kind of cyclic reasoning may not seem sound, 
it is in fact valid, and justified by coinduction, which captures the essence of par- 
tial correctness, language-independently. Reaching something from the original 
specification shows we have reached some fixpoint, and coinduction is directly 
related to greatest fixpoints. This is explained in detail in Sect. 3. 

In many examples it is useful to chain together individual proofs, similar to 
(HL-sEQ). Thus, we introduce the following sequential composition construct: 
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Definition 1. For S1, S2 C C x P(C), let Sı 3 S2 = {(c, fe l JQ . (c,Q) € 
Sı \Vd € Q,(d,P) € So}. Also, we define trans(S) as S (trans can be 
thought of as a transitivity proof rule). 


If Sı and Sp are valid then Sı $ S2 is also valid (Lemma 2). 
Given n, let Qn and Tn be the following sets of configurations, where Q,, and 
Tn represent the invariant set and terminal set, respectively: 


loop|nten’,sre m1 iy | Yn’ 
P 


i= we 


(skip|nt>0,sH 0, 1i} 


oS 
{ 


and let us define the following specifications: 


Sı = {((s=0; loop|nt>n), Qn) | Vn} 
S2 = {((loop|nren’, s> >" = Tn) | Vn, n} 


Our target S in (1) is included in $13.59, so it suffices to show that Sı and S2 are 
valid. Sı clearly is: (s=0;loop|n++n) >$ (loop|nt+n,s++0) represents the 
(symbolic) execution step or steps taken to assign program variable s, and the 
set of specifications {((loop|nt+n,st+0),Q,) | Vn} is vacuously valid (note 
Dis i = 0). For the validity of S2, we partition it in two subsets, one where 


n’ = 1 and another with n’ # 1 (case analysis). The former holds same as $4, 
noting that 


(loop|nt>1,sr So so t (skip|n-0,srH i) 
The latter holds by coinduction (for S2), because first 
(loop|nren’, > ae vt) >R (loop|nren’ — 1, soy ia) 
and second the following inclusion holds: 
{((Loop|n-+n! — 1,84 Eh 1 i), Ta) | Ym} E S 


The key part of the proof above was to show that the reachability claim 
about the loop (S2) was stable under the language semantics. Everything else was 
symbolic execution using the (trusted) operational semantics of the language. By 
allowing desirable program properties to be uniformly specified as reachability 
claims about the (executable) language semantics itself, our approach requires 
no auxiliary formalization of the language for verification purposes, and thus no 
soundness or equivalence proofs and no transformations of the original program 
to make it fit the restrictions of the auxiliary semantics. Unlike for the Hoare 
logic proof, the main “proof rules” used were just performing execution steps 
using the operational semantics rules, as well as the generic coinductive principle. 
Section 3 provides all the technical details. 
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STRUCTURAL OPERATIONAL SEMANTICS 


(x | o) > (a(x) | o) 


(--x|o)— (i | oft/z]) ift =o(x) —mi 1 


(e1 | o) = (e1 | 0’) 

(e1 op e2 | o) > (e1 op e2 | 0’) 
(e2 | a) = (e3 | 0’) 

(i1 op e2 | o) => (i1 op e; | o’) 


(i1 op i2 | o) > (i1 OP i2 | 0) 
(sı |o) > (si |o") 


(sı s2 |0) > (si s2 | a’) 


(skip s|) — (s |o) 


(x :=i | o) — (skip | a[i/x]) 
(elo) > (e |’) 
(if e then {s1} else {s2} | o) — (if e’ then {s1} else {s2} | o’) 
(if i then {sı} else {s2} | o) > (sı |o) ifi#0 


(if 0 then {sı} else {s2} | o) > (s2 | o) 
(while e {s} | o) — (if e then {s while e {s}} else {skip} | o) 


REDUCTION SEMANTICS 
(evaluation contexts syntax omitted— [17]) 


E[r] > Er’) 
(E | oj[z] > (E | o)[ø(x)] 
(E | o)[--a] > (E | oli/a)li] if i = ole) -im 1 
(E | o)[x:= i] > (E | oli/x])[skip] 
ay op 12 => a1 OP Int i2 
skip s—s 
if i then {si} else {s2} > sı ifi#0 
if 0 then {sı} else {s2} — s2 
while e {s} — if e then {s while e {s}} else {skip} 


K SEMANTICS 
(configuration and strictness omitted— [9]) 


(z sdk (et iisa) state 


( == 2 saudi (ave Ho d i sis) state 
i —Int 1 a Int 1 
(x := 71 nik (as Lr aan) tate 
skip a 


(plus the last five simple rules under reduction semantics) 


Fig. 3. Three different operational semantics of IMP, generating the same execution 
step relation R (or >p). 
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2.3 Defining Execution Step Relations 


Since our coinductive verification framework is parametric in a step relation, 
which also becomes the only trust base when certified verification is sought, it is 
imperative for its practicality to support a variety of approaches to define step 
relations. Ideally, it should not be confined to any particular semantic style that 
ultimately defines a step relation, and it should simply take existing semantics 
“off-the-shelf” and turn them into sound and relatively complete program veri- 
fiers for the defined languages. We briefly recall three of the semantic approaches 
that we experimented with in our Coq formalization [16]. 

Small-step structural operational semantics [18] (Fig. 3 top) is one of the most 
popular semantic approaches. It defines the transition relation inductively. This 
semantic style is easy to use, though often inconvenient to define some features 
such as abrupt changes of control and true concurrency. Additionally, finding 
the next successor of a configuration may take longer than in other approaches. 
Reduction semantics with evaluation contexts [17], depicted in the middle of 
Fig. 3, is another popular approach. It allows us to elegantly and compactly define 
complex evaluation strategies and semantics of control intensive constructs (e.g., 
call/cc), and it avoids a recursive definition of the transition relation. On the 
other hand, it requires an auxiliary definition of contexts along with splitting 
and plugging functions. 

As discussed in Sect. 1, several large languages have been given formal seman- 
tics using K [9] (Fig.3 bottom). K is more involved and less conventional than 
the other approaches, so it is a good opportunity to evaluate our hypothesis that 
we can just “plug-and-play” operational semantics in our coinductive framework. 
A k-style semantics extends the code in the configuration to a list of terms, and 
evaluates within subterms by having a transition that extracts the term to the 
front of the list, where it can be examined directly. This allows a non-recursive 
definition of transition, whose cases can be applied by unification. 

In practice, in our automation, we only need to modify how a successor for 
a configuration is found. Besides that, the proofs remain exactly the same. 


3 Coinduction as Partial Correctness 


The intuitive coinductive proof of the correctness of sum in Sect. 2.2 likely raised 
a lot of questions. We give formal details of that proof in this section as well 
go through some definitions and results of the underlying theory. All proofs, 
including our Coq formalization, are in [16]. 


3.1 Definitions and Main Theorem 


First, we introduce a definition that we used intuitively in the previous section: 


Definition 2. If R C C x C, let validg C C x P(C) be defined as validr = 
{(c, P) | c =r P holds}. 
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Recall from Sect.2.1 that c =pr P holds iff the initial state c can either reach 
a state in P or can take an infinite number of steps (with —,). Pairs (c, P) € 
C x P(C) are called claims or specifications, and our objective is to prove they 
hold, i.e., c =r P. Sets of claims S C C x P(C) are valid if S C validg. To 
show such inclusions by coinduction, we notice that validp is a greatest fixpoint, 
specifically of the following operator: 


Definition 3. Given RCC x C, let stepp : P(C x P(C)) — P(C x P(C)) be 
stepp(S) = {(c,P)|ce P V 3d.c—>pr d^ (d, P) € S} 


Therefore, to prove (c, P) € stepp(S), one must show either that c € P or 
that (suce(c), P) € S, where succ(c) is a resulting configuration after taking a 
step from c by the operational semantics. 


Definition 4. Given a monotone function F : P(D) > P(D), let its F-closure 
F* : P(D) — P(D) be defined as F*(X) = pY.F(Y) U X, where u is the least 
fixpoint operator. This is well-defined as Y œ F(Y )UX is monotone for any X. 


The following lemma suffices for reachability verification: 


Lemma 1. For any RCCxC and SCCxP(C), we have S C steppg(steph(S)) 
implies S C validr. 


The intuition behind this lemma is captured in Sect. 2.2: we continue taking 
steps and once we reach a set of states already seen, we know our claim is valid. 
This would not be valid if steppg(steph(S)) was replaced simply with step} (S), 
as X C F*(X) hold trivially for any F and X. Lemma 1 (along with elementary 
set properties) replaces the entire program logic shown in Fig. 2. The only formal 
definition specific to the target language is the operational semantics. Lemma 1 
does not need to be modified or re-proven to use it with other languages or 
semantics. It generalizes into a more powerful result, that can be used to derive 
a variety of coinductive proof principles: 


Theorem 1. If F,G: P(D) — P(D) are monotone and G(F(A)) C F(G*(A)) 
for any A C D, then X C F(G*(X)) implies X C vF for any X C D, where 
vF is the greatest fixpoint of F. 


Proofs, including a verified proof in our Coq formulation are in [16]. The 
proof can also be derived from [12-14], though techniques from these papers 
had previously not been applied to program verification. Lemma 1 is an easy 
corollary, with both F and G instantiated as stepp, along with a proof that 
vstepp = validg (see [16]). However, instantiating F and G to be the same 
function is not always best. An interesting and useful G is the transitivity func- 
tion trans in Definition 1, which satisfies the hypothesis in Theorem 1 when F is 
step. [16] shows other sound instantiations of G. 

We can also use Theorem 1 with other definitions of validity expressible as 
a greatest fixpoint, e.g., all-path validity. For nondeterministic languages we 
might prefer to say c >" P holds if no path from c reaches a stuck configuration 
without passing through P. This is the greatest fixpoint of 


stepy,(S) = {(c,P) |c € P VAd.c >a dAVd.(c—R d implies (d, P) € $)} 
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The universe of validity notions that can be expressed coinductively, and thus 
the universe of instances of Theorem 1 is virtually limitless. Below is another 
notion of validity that we experimented with in our Coq formalization [16]. 
When proving global program invariants or safety properties of non-deterministic 
programs, we want to state not only reachability claims c > P, but also that all 
the transitions from c to configurations in P respect some additional property, 
say T. For example, a global state invariant J can be captured by a T such that 
(a,b) € T iff (a) and I(b), while an arbitrary safety property can be captured by 
a T that encodes a monitor for it. This notion of validity, which we call (all-path) 
“until” validity, is the greatest fixpoint of: 


until,(S) ={(c,T, P)| ce PV 
Jd.c —>pr dA^VYd.(c—>pr d implies (c,d) € T A (d, T, P) € S)} 


This allows verification of properties that are not expressible using Hoare logic. 


3.2 Example Proof: Sum 


Now we demonstrate the results above by providing all the details that were 
skipped in our informal proof in Sect. 2.2. The property that we want to prove, 
expressed as a set of claims (c, P), is 
S = {((s=0;while(--n) {s=s+n; } T|n=>n,o|L/s]}, 
C a D E | Vn, T,o} 


We have to prove S C validr. Note that this specification is more general than 
the specifications in Sect. 2.2. Here, T represents the remainder of the code to 
be executed, while ø represents the remainder of the store, with o[-L/s] as ø 
restricted to Dom(o)/{s}. Thus, we write out the entire configuration here, 
which gives us freedom in expressing more complex specifications if needed. 
Instead of proving this directly, we will prove two subclaims valid and connect 
them via sequential composition (Definition 1). First, we need the following: 


Lemma 2. 5; 355 C validr if Sı C validg and S> C validpg. 
As before, let 
Qn = {(loop; T|ntren’, se Din Li, a) | Yn’} 
T,={(T|n40,s4 7, a} 
and define 
Sı = {((s=0; loop; T|nten,o[L/s]),Qn) | Yn, T, o} 
S2 ={((loop; T|nten’,sr yu er o), Tn) | Vn, n’, T, o} 
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Since S C Sı 3 S (by Qn), it suffices to show Sı U S2 C validr. To prove 
Sı C validr, by Lemmal we show Sı C stepp(step(S1)). Regardless of the 
employed executable semantics, this should hold: 


Yn, T,o.(s=0; loop; T|nte-n,o[L/s]) >r (loop; T|n=>n,s= 0,0) 


Choosing the second case of the disjunction in stepp with d matching this step, 
it suffices to show 


{((loop; T|n= n,s= 0,0), Qn) | Vn, T,o} C stepp(S1) 
Note that we can unfold any fixpoint F*(S) to get the following two equations: 
F(F"(S)) E F(F*(8)) U S = F*(S) SC FE (S))US=F'(S) (3) 


We use the first equation to expose an application of stepp on the right hand 
side, so it suffices to show the above is a subset of step p(steph(S)). We then use 
the first case of the disjunction (showing c € P) in stepp, and instantiating n’ 
to n proves this goal, since ey -1i = 0. Thus Sı C validp. 

Now we prove S2 C valida, « or S2 C steppg(steph(S2)). First, note the oper- 
ational semantics of IMP rewrites while loops to if statements. Then, by the 
definition of stepp, it suffices to show that 


{((if (--n) {s=s+n; loop}; T |n n’, s 0") i, 0), Ta) | Yn, n’, T, o} C steph (S2) 


Using the first unfolding from (3), it suffices to show the above is a subset of 
stepp(stepp(S2)), i.e. we expose an application of stepp on the right hand side. 
The definition of step g thus allows the left hand side to continue taking execution 
steps, as long as we keep unfolding the fixpoint. Continuing this, the if condition 
becomes a single, but symbolic, boolean value. Specifically, it suffices to show: 


{((if (n'-1 4 0) {s=s+n; loop}; T|n = n1, s = EI, i, 0), Tn) |Vn, n’, T, o} C steph (S2) 


Further progress requires making a case distinction on whether n’ — 1 = 0. A 
case distinction corresponds to observing that AU B C X if both A C X and 
B C X. Here we split the current set of claims into those with n’ — 1 = 0 and 
n’ — 1 Æ 0, and separately establish the following inclusions: 


{((if (false) {s=s+n;loop};T |n 0, s> Di i, 0o), Tn) | Yn, T, o} C steph (S2) 
{((if (true) {s=s+n; loop}; T |n n1, s >$, i,c),Tn)|Vn,n'£1,T, 0} C steph (92) 


Continuing symbolic execution and using X72} i+ (n'—1) = Y7} i, we get 


ee st 077" i0), Ta) | Yn, T,o} C steph (S2) 
E EE TT T,o,n! — 14 0} C steph (6) 


In the n’ — 1 = 0 case, the current configuration is already in the corresponding 
target set. To conclude, we expose another application of step, as before, but use 
the clause c € P of the disjunction in stepp to leave the trivial goal Vn, T, o. (T | 
nt+0,s' aD o) {(T | nb0,sh 20D o)}. For the n’ — 1 Æ 0 case, 
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we have a set of claims that are contained in the initial specification S2. We 
conclude by showing S2 C steph(S2) from the second equation in (3) by noting 
that S C F*(S) for any F. So this set of claims is contained in S> by instantiating 
the universally quantified variable n’ in the definition of S> with n’ — 1. Thus it 
is contained in step,(S2) and thus it is a subset of valid p. 


3.3 Example Proof: Reverse 


Consider now the following program to reverse a linked list, written in the HIMP 
language (Fig. 5a). We will discuss HIMP in more detail Sect. 4. 


decl p; decl y; p := 0; 
while (x<>0) { y := (x+1); *(x+1) := p; p := x; x := y; } 


Call the above code rev and the loop rev-loop. We prove this program is 
correct following intuitions from separation logic [19,20] but using the exact 
same coinductive technical machinery as before. Assuming we have a predicate 
that matches a heap containing only a linked list starting at address x and 
representing the list | (which we will see in Sect. 4.2), our specification becomes: 


S = {((rev; T | list(1,x)), {(T | Ar.list(rev(1),r)) }) | Vl, £, T} 


where rev is the mathematical list reverse. We proceed as in the previous exam- 
ple, first using lemma then stepping with the semantics, but with Qn as 


{(rev-loop; T | list(A, x) x list(B,p) x x} z x p p * y y * Arlist(B++A,r)) 
| VA, B,p, y} 


where ++ is list append. We continue as before to prove our original specification. 
Sı and S2 follow from our choice for Qn, our “loop invariant.” Specifically, 


Sı = {((rev; T | list(l, x)), {(rev-loop; T | list(A, x) » list(B, p) * x} z xpe pxy=> y 
x Ar list(B++A,r)) | VA, B, p,y}) | YL, £, T} 

S2 = {((rev-loop; T | list(A, x) x list(B, p) * x> z * p> p * ym y * Ar list(B++A,r)), 
{(T | Ar.list(rev(l),r)}}) | VA, B, p, y,l,£, T} 


Then, the individual proofs for these specifications closely follow the same 
flavor as in the previous example: use stepp to execute the program via the 
operational semantics, use unions to case split as needed, and finish when we 
reach something in the target set or that was previously in our specification. The 
inherent similarity between these two examples hints that automation should not 
be too difficult. We go into detail regarding such automation in Sect. 4. 

Reasoning with fixpoints and functions like stepp can be thought of as rea- 
soning with proof rules, but ones which interact with the target programming 
language only through its operational semantics. The stepp operation corre- 
sponds, conceptually, to two such proof rules: taking an execution step and 
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HIMP 
append(x, y) Lambda 
decl p; 5 
; A (A IfNil 1 0 
a RRA owe. 
: > EN ; (A 4 A110) A 
wee Pt) eld P = eR PE |) EXA A O 1J) Cheese 0) 
*(pt1) := y; 


(A IfNil (Cdr 0) 

((A 5) (Assign 0 
Stack (Cons (Car 0) 3))) 
(2 (Cdr 0))))) 

1))) 


return x; 


append over if over begin 
1+ dup @ dup while nip repeat 
drop ! else nip then ; 


Fig. 4. Destructive list append in three languages. 


showing that the current configuration is in the target set. Sequential composi- 
tion and the trans rule corresponds to a transitivity rule used to chain together 
separate proofs. Unions correspond to case analysis. The fixpoint in the closure 
definition corresponds to iterative uses of these proof rules or to referring back 
to claims in the original specification. 


4 Experiments 


Now that we have proved the correctness of our coinductive verification approach 
and have seen some simple examples, we must consider the following pragmatic 
question: “Can this simple approach really work?”. We have implemented it in 
Coq, and specified and verified programs in a variety of languages, each language 
being defined as an operational semantics [16]. We show not only that coinductive 
program verification is feasible and versatile, but also that it is amenable to 
highly effective proof automation. The simplifications in the manual proof, such 
as taking many execution steps at once, translate easily into proof tactics. 

We first discuss the example languages and programs, and the reusable ele- 
ments in specifications, especially an effective style of representation predicates 
for heap-allocated data structures. Then we show how we wrote specifications 
for example programs. Next we describe our proof automation, which was based 
on an overall heuristic applied unchanged for each language, though parameter- 
ized over subroutines which required somewhat more customization. Finally, we 
conclude with discussion of our verification of the Schorr-Waite graph-marking 
example and a discussion of our support for verification of divergent programs. 


4.1 Languages 


We discuss three languages following different paradigms, each defined opera- 
tionally. Many language semantics are available with the distributions of K [9], 
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PLT-Redex [10], and Ott [11], e.g., but we believe these three languages are suf- 
ficient to illustrate the language-independence of our approach. Figure 4 shows 
a destructive linked list append function in each of the three languages. 
HIMP (IMP with Heap) is an imperative language with (recursive) functions 
and a heap. The heap addresses are integers, to demonstrate reasoning about low- 
level representations, and memory allocation/deallocation are primitives. The 
configuration is a 5-tuple of current code, local variable environment mapping 
identifiers to values, call stack with frames as pairs of code and environment, 
heap, and a collection of functions as a map from function name to definition. 
Stack is a Forth-like stack based language, though, unlike in Forth, we do 
make control structures part of the grammar. A shared data stack is used both 
for local state and to communicate between function invocations, eliminating the 
store, formal parameters on function declarations, and the environment of stack 
frames. Stack’s configuration is also a 5-tuple, but instead of a current environ- 
ment there is a stack of values, and stack frames do not store an environment. 
Lambda is a call-by-value lambda calculus, extended with primitive integers, 
pair and nil values, and primitive operations for heap access. Fixpoint combina- 
tors enable recursive definitions without relying on primitive support for named 
functions. We use De Bruijn indices instead of named variables. The semantics 
is based on a CEK/CESK machine [21,22], extended with a heap. Lambda’s 
configuration is a 4-tuple: current expression, environment, heap, continuation. 


Pgm ::= FunDef* 


Pgm := Val 
REN * 
a i ge ie eae Val ::= Nat | Inc | Dec | Add 
dd < 1d; ) £ Stmt} | PunDef == Add1 Nat | Eq | Eq Val 
Exp ::= Id ( Exp* ) name : Inst* Nil | Cons | Consi Val 
alloc | load Exp Inst ::= Dup n Car | Cdr 
Exp . Id Roll n Closure (Erp, Env) 
build Map Pop | Push z Pair (Val, Val) 
Ei BinOp f Exp := Exp Exp | A Exp 
Stmt := * Exp := Exp Load | Store Var Nat 
dealloc Exp Call 2 | Rer if Exp then Exp else Exp 
Id ( Bxp* ) | decl Id TE dust" Ams Exp ; Exp | Deref Exp 
return Erp ; While & Exp | * Exp | Exp := Exp 
Inst* Inst* 
nearer Env := Val* 


(a) HIMP syntax, ex- (b) Stack syntax 
tending the IMP syntax (c) Lambda syntax 


Fig. 5. Syntax of HIMP, Stack, and Lambda 
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4.2 Specifying Data Structures 


Our coinductive verification approach is agnostic to how claims in C x P(C) are 
specified. In Coq, we can specify sets using any definable predicates. Within this 
design space, we chose matching logic [23] for our experiments, which introduces 
patterns that concisely generalize the formulae of first order logic (FOL) and 
separation logic, as well as term unification. Symbols apply on patterns to build 
other patterns, just like terms, and patterns can be combined using FOL con- 
nectives, just like formulae. E.g., pattern PAQ matches a value if P and Q both 
match it, [t] matches only the value t, 3x.P matches if there is any assignment of 
x under which P matches, and [p] where ¢ is a FOL formula matches any value 
if p holds, and no values otherwise (in [23] neither [t] nor [p] require a visible 
marker, but in Coq patterns are a distinct type, requiring explicit injections). 

To specify programs manipulating heap data structures we use patterns 
matching subheaps that contain a data structure representing an abstract value. 
Following [24], we define representation predicates for data structures as func- 
tions from abstract values to more primitive patterns. The basic ingredients are 
primitive map patterns: pattern emp for the empty map, k> v for the singleton 
map binding key k to value v, and P * Q for maps which are a disjoint union 
of submaps matching P and, resp., Q. We use abbreviation (y) = |y] A emp 
to facilitate inline assertions, and pr>{vp,...,u;} = p vo * . . . x (p + i) > v; to 
describe values at contiguous addresses. A heap pattern for a linked list starting 
at address p and holding list / is defined recursively by 


list(nil, p) = (p = 0) 
list(x : 1p) = (p # 0) * 
We also define list-seg(l, e, p) for list segments, useful in algorithms using pointers 


to the middle of a list, by generalizing the constant 0 (the pointer to the end of 
the list) to the trailing pointer parameter e. Also, simple binary trees: 


pı .pro{ax, pi} * list (1, pr) 


tree(leaf, p) = (p = 0) 
tree(node(a,1,r),p) = (p #0) * 3pı, pr-p{z, lp, rp} * tree(I, lp) * tree(r, rp) 


Given such patterns, specifications and proofs can be done in terms of the 
abstract values represented in memory. Moreover, such primitive patterns are 
widely reusable across different languages, and so is our proof automation that 
deals with primitive patterns. Specifically, our proof scripting specific to such 
pattern definitions is concerned exclusively with unfolding the definition when 
allowed, deciding what abstract value, if any, is represented at a given address 
in a partially unfolded heap. This is further used to decide how another claim 
applies to the current state when attempting a transitivity step. 


4.3 Specifying Reachability Claims 


As mentioned, claims in C x P(C) can be specified using any logical formalism, 
here the full power of Coq. An explicit specification can be verbose and low-level, 
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Table 1. Example list specifications 


call(Head, [zx], [H] A list(v : l, x), Ar.(r = v) * [H]) 
call(Tail, [a], [H] A list(v : l, x), Ar.[H] A _ x list(l, r)) 
call( Add, [y, z], list (J, x), Ar-list(y : l,1)) 

call(Add’, |y, z], [H] A list(1, x), Ar-list_seg([y], x, r) * [H]) 
call(Swap, [x], list(a : b : l, a), Ar.list(b: a: l, £)) 
call(Dealloc, [a], list(l, x), Ar.emp) 

call(Length, [x], [H] A list(l, x), Ar.(r = len(1)) » [H]) 
call(Sum, [x], [H] A list(l, x), Arr = sum/(l))) * [H]) 
call(Reverse, [x], list(l, £x), Ar.list(rev(l),r)) 
call(Append, |x, y], list(a, x) * list(b, y), Ar.list(a++), r)) 
call(Copy, [x], [H] A list(l, x), Ar.list(l, r) x [H]) 

call (Delete, [v, x], list (l, x), Ar.list(delete(v, L), r)) 


especially when many semantic components in the configuration stay unchanged. 
However, any reasonable logic allows making definitions to reduce verbosity and 
redundancy. Our use of matching logic particularly facilitates framing conditions, 
allowing us to regain the compactness and elegance of Hoare logic or separation 
logic specifications with definable syntactic sugar. For example, defining 


call( f (formals){ body}, args, Pin, Pout) = 
{((f (args) ~ rest, env, stk, heap, funs), {(r œ rest, env, stk, heap’, funs) 
| Vr, heap’. heap’ E Pout(r) * [H;]}) 
| Yrest, env, stk, heap, Hy, funs. heap F Pin * [Hs] A ff (formals){ body} € funs} 


gives the equivalent of the usual Hoare pre-/post-condition on function calls, 
including heap framing (in separation logic style). The notation x © y represents 
the order of evaluation: evaluate x first followed by y. This is often used when y 
can depend on the value x takes after evaluation. 

The first parameter is the function definition. The second is the arguments. 
The heap effect is described as a pattern Pin for the allowable initial states of 
the heap and function Pout from returned values to corresponding heap pat- 
terns. For example, we specify the definition D of append in Fig.4 by writing 
call(D, [x,y], (list(a, x) * list(b, y)), (Ar.list(a++b, r))), which is as compact and 
elegant as it can be. More specifications are given in Table1. A number of 
specifications assert that part of the heap is left entirely unchanged by writ- 
ing [H] A... in the precondition to bind a variable H to a specific heap, and 
using the variable in the postcondition (just repeating a representation predi- 
cate might permit a function to reallocate internal nodes in a data structure to 
different addresses). The specifications Add and Add’ show that it can be a bit 
more complicated to assert that an input list is used undisturbed as a suffix of 
a result list. Specifications such as Length, Append, and Delete are written in 
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terms of corresponding mathematical functions on the lists represented in the 
heap, separating those functional descriptions from details of memory layout. 

When a function contains loops, proving that it meets a specification often 
requires making some additional claims about configurations which are just 
about to enter loops, as we saw in Sect. 2.2. We support this with another pat- 
tern that takes the current code at an intermediate point in the execution of a 
function, and a description of the environment: 


stmt(code, env, Pin, Pout) = 
{((code, (env, er), stk, heap, funs), {(return r A rest, env’, stk, heap’, funs) 
| Vr, rest, env’, heap’ heap’ F Pous(r) * [Hi] }) 
| Ver, stk, heap, Hy, funs . heap F Pin * [H,|} 


Verifying the definition of append in Fig.4 meets the call specification above 
requires an auxiliary claim about the loop, which can be written using stmt as 


stmt(while (*(pt1)<>0)...,(xrKea,yry,prp), 
(list _seg(Iz, p, x) * list(lp, p) * list(ly, y)), (Ar-list (la tHl,+Hl,,1))) 


The patterns above were described using HIMP’s configurations; we defined 
similar ones for Stack and Lambda also. 


4.4 Proofs and Automation 


The basic heuristic in our proofs, which is also the basis of our proof automation, 
is to attack a goal by preferring to prove that the current configuration is in the 
target set if possible, then trying to use claims in the specification by transitivity, 
and only last resorting to taking execution steps according to the operational 
semantics or making case distinctions. Each of these operations begins, as in 
the example proofs, with certain manipulations of the definitions and fixpoints 
in the language-independent core. Our heuristic is reusable, as a proof tactic 
parameterized over sub-tactics for the more specific operations. A prelude to the 
main loop begins by applying the main theorem to move from claiming validity 
to showing a coinduction-style inclusion, and breaking down a specification with 
several classes of claims into a separate proof goal for each family of claims. 

Additionally, our automation leverages support offered by the proof assis- 
tant, such as handling conjuncts by trying to prove each case, existentials by 
introducing a unification variable, equalities by unification, and so on. More- 
over, we added tactics for map equalities and numerical formulae, which are 
shared among all languages involving maps and integers. The current proof goal 
after each step is always a reachability claim. So even in proofs which are not 
completely automatic, the proof automation can give up by leaving subgoals for 
the user, who can reinvoke the proof automation after making some proof steps 
of their own as long as they leave a proof goal in the same form. 
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Proving the properties in Table 1 sometimes required making additional 
claims about while loops or auxiliary recursive functions. All but the last four 
were proved automatically by invoking (an instance of) our heuristic proof tactic: 


Proof. list_solver. Qed. 


Append and copy needed to make use of associativity of list append. Reverse 
used a loop reversing the input list element by element onto an output list, which 
required relating the tail recursive rev_app(a : l, y) = rev_app(l,x : y) with the 
Coq standard library definition rev(x : 1) = rev(1)++[x]. Manually applying these 
lemmas merely modified the proof scripts to 


list_solver. rewrite app_ass in * |- . list_run. 
list_solver. rewrite <- rev_alt in * |- . list_run. 


These proofs were used verbatim in each of our example languages. The only 
exceptions were append and copy for Lambda, for which the app_ass lemma was 
not necessary. For Delete, simple reasoning about delete(v,1) when v is and is 
not at the head of the list is required, though the actual reasoning in Coq varies 
between our example languages. No additional lemmas or tactics equivalent to 
Hoare rules are needed in any of these proofs. 


4.5 Other Data Structures 


Matching logic allows us to concisely define many other important data struc- 
tures. Besides lists, we also have proofs in Coq with trees, graphs, and stacks [16]. 
These data structures are all used for proving properties about the Schorr- Waite 
algorithm. In the next section we go into more detail about these data structures 
and how they are used in proving the Schorr-Waite algorithm. 


4.6 Schorr-Waite 


Our experiments so far demonstrate that our coinductive verification approach 
applies across languages in different paradigms, and can handle usual heap pro- 
grams with a high degree of automation. Here we show that we can also handle 
the famous Schorr-Waite graph marking algorithm [25], which is a well-known 
verification challenge, “The Schorr-Waite algorithm is the first mountain that 
any formalism for pointer aliasing should climb” [26]. To give the reader a feel 
for what it takes to mechanically verify such an algorithm, previous proofs in [27] 
and [28] required manually produced proof scripts of about 470 and, respectively, 
over 1400 lines and they both used conventional Hoare logic. In comparison our 
proof is 514 lines. Line counts are a crude measure, but we can at least conclude 
that the language independence and generality of our approach did not impose 
any great cost compared to using language-specific program logics. 

The version of Schorr-Waite that we verified is based on [29]. First, however, 
we verify a simpler property of the algorithm, showing that the given code cor- 
rectly marks a tree, in the absence of sharing or cycles. Then we prove the same 
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code works on general graphs by considering the tree resulting from a depth first 
traversal. We define graphs by extending the definition of trees to allow a child 
of a node in an abstract tree to be a reference back to some existing node, in 
addition to an explicit subtree or a null pointer for a leaf. To specify that graph 
nodes are at their original addresses after marking, we include an address along 
with the mark flag in the abstract data structure in the pattern 


erph(leaf, m, p’) = (p' = 0 

grph(backref (p), m, p’) = (p 

grph(node(p, l, r), m, p') = (p'=p) * Ipi, Pr - 
pro{m, pi, pr} * grph(l, m, pi) * grph(r, m, pr) 


The overall specification is call( Mark, [p], grph(G, 0, p), Ar.grph(G, 3, p)). 

To describe the intermediate states in the algorithm, including the clever 
pointer-reversal trick used to encode a stack, we define another data structure for 
the context, in zipper style. A position into a tree is described by its immediate 
context, which is either the topmost context, or the point immediately left or 
right of a sibling tree, in a parent context. These are represented by nodes 
with intermediate values of the mark field, with one field pointing to the sibling 
subtree and the other pointing to the representation of the rest of the context. 


stack(Top, p) = (p = 0) 
stack(LeftOf(r, k), p) = Ipr, Pk - p{1, Pr, Pr} * grph(r, 0, pr) * stack(k, pe) 
stack(RightOf(!, k), p) = 3pı, pe. p>{2, pr, pi} * stack(k, pe) * grph(J, 3, pi) 


This is the second data structure needed to specify the main loop. When it is 
entered, there are only two live local variables, one pointing to the next address 
to visit and the other keeping context. The next node can either be the root of 
an unmarked subtree, with the context as stack, or the first node in the implicit 
stack when ascending after marking a tree, with the context pointing to the node 
that was just finished. For simplicity, we write a separate claim for each case. 


stmt(Loop, (p> p,q q), (grph(G, 0, p) * stack(S, q)), Ar.-grph(plug(G, S), 3)) 
stmt(Loop, (p> p, q+ q), (stack(S, p) * grph(G, 3, q)), Ar-grph(plug(G, S), 3)) 


The application of all the semantic steps was handled entirely automatically, 
the manual proof effort being entirely concerned with reasoning about the pred- 
icates above, for which no proof automation was developed. 


4.7 Divergence 


Our coinductive framework can also be used to verify a program is divergent. 
Such verification is often a topic that is given its own treatment, as in [30,31], 
though in our framework, no additional care is needed. To prove a program is 
divergent on all inputs, one verifies a set of claims of the form (c, Ø), so that no 
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configuration can be determined valid by membership in the final set of states. 
We have verified the divergence of a simple program under each style of IMP 
semantics in Fig.3, as well as programs in each language from Sect. 4.1. These 
program include the omega combinator and the sum program from Sect. 3.2 with 
true replacing the loop guard. 


4.8 Summary of Experiments 


Statistics are shown in Table 2. For each example, size shows the amount of 
code to be verified, the size of the specification, and the size of the proof script. 
If verifying an example required auxiliary definitions or lemmas specific to that 
example, the size of those definitions were counted with the specification or proof. 
Many examples were verified by a single invocation of our automatic proof tactic, 
giving 1-line proofs. Other small proofs required human assistance only in the 
form of applying lemmas about the domain. Proofs are generally smaller than 
the specifications, which are usually about as large as the code. This is similar 
to the results for Bedrock [32], and good for a foundational verification system. 


Table 2. Proof statistics 


Size (lines) Time (s) Size (lines) Time (s) 

Example Code Spec Proof Prove Check Example Code Spec Proof Prove Check 

Simple Lists: head 2 4 L. 2.1 0.8 

undefined 2 3 1 2.1 1.1 tail 2 4 1 2.2 0.9 

averaged 2 5 1 2.3 0.8 add 4 4 1 48 1.2 

min 3 4 2 2.1 0.7 swap 6 4 1 19.6 3.6 

max 3 4 2 2.1 0.7 dealloc 6 4 1 6.3 1.3 

multiply 9 6 1 7.2 1.4 length(rec) 4 4 1 48 1.4 

sum(rec) 6 7 6 4.2 1.0  length(iter) 4 8 1 7.2 1.5 

sum(iter) 6 11 8 6.0 1.0 sum(rec) 4 4 1 82 2.0 

sum(iter) 4 8 1 9.11 1% 

Trees ae 8 5 3 150 22 

height 8 3 3 20.5 4.1 

sina 5 3 1 8.0 22 append 7 9 3 19.4 3.6 

find 6 9 1 15.5 31 copy 14 11 3 55.0 9.3 

miror. T 6 1 190 49 delete 16 18 9 44.6 6.0 
dealloc 15 T 1 19.6 4.1 Schorr-Waite 

flatten(rec) 12 10 1 30.9 6.8 tree 14 91 116 60.1 7.6 

flatten(iter) 24 17 4 150.3 22.8 graph 14 91 203 133.6 18.2 


The reported “Proof” time is the time for Coq to process the proof script, 
which includes running proof tactics and proof searches to construct a com- 
plete proof. If this run succeeds, it produces a proof certificate file which can 
be rechecked without that overhead. For an initial comparison with Bedrock 
we timed their SinglyLinkedList.v example, which verifies length, reverse, 
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and append functions that closely resemble our example code. The total time 
to run the Bedrock proof script was 93s, and 31s to recheck the proof cer- 
tificate, distinctly slower than our times in Table2. To more precisely match 
the Bedrock examples we modified our programs to represent lists nodes with 
fields at successive addresses rather than using HIMP’s records, but this only 
improved performance, down to 20s to run the proof scripts, and 4s to check 
the certificates. 


5 Subsuming Reachability Logic 


Reachability logic [33] is a closely related approach to program verification using 
operational semantics. In fact, our coinductive approach came about when trying 
to distill reachability logic into its mathematical essence. The practicality of 
reachability logic has recently been demonstrated, as the reachability logic proof 
system has been shown to work with several independently developed semantics 
of real-world languages, such as C, Java, and JavaScript [15]. 


5.1 Advantages of Coinduction 


A mechanical proof of our soundness theorem Axiom: 
gives a more usable verification framework, p> € A 
since reachability logic requires operational A Fey =>% 
semantics to be given as a set of rewrite rules, Reflexivity : 
while our approach does not. Further, reacha- A F= 
bility logic fixes a set of syntactic proof rules, Transitivity : 


while in our approach the mathematical fix- 
points and functions act as proof rules with- 
out explicitly requiring any. In fact, the gen- 
erality of our approach allows introductions 
of other derived rules that do not compromise 
the soundness result. Similarly, the generality 
allows higher-order verification, which reach- 
ability logic cannot handle. 

Further, we saw in Sect. 3 that the general 
proof of our theorem is entirely mathemati- 
cal. We instantiate it with the stepp func- 
tion to get a program verification framework. 
However, if we instantiate it with other func- 
tions, we could get frameworks for proving 
different properties, such as all-path valid- 
ity or the “until” notion of validity previ- 
ously mentioned. Reachability logic does not 
support any other notion of validity with- 
out changes to its proof system, which then 
require new proofs of soundness and relative 


Ateyisty, AUC F p> 93 
A Fe pı > p3 


Logic Framing : 


Ateysy’ w is a FOL formula 
Atepavsy’Aw 

Consequence : 

Fete, Are > Epp 


A Fe v1 > p2 
Case Analysis : 
Atcyi>y Atcyo > 
A Fe yi V p2 >p 


Abstraction : 
Atey>y’ X N FreeVars(y’) = 0 
AteiX psy’ 


Circularity : 


A Feutgsy} Y Sy 
Ateysy’ 


Fig. 6. Reachability Logic proof 
system. Sequent A F py > y’ isa 
shorthand for A Hg Y > y’. 
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completeness. For our framework, the proof of the main theorem does not need 
to be modified at all, and one only needs to prove that all-path validity is a 
greatest fixpoint (see Sect.3). The same is true for any property. In this sense, 
this coinduction framework is much more general than the reachability logic 
proof system presented in [34]. 


5.2 Reachability Logic Proof System 


The key construct in reachability logic is the notion of circularity. Circularities, 
represented as C in Fig. 6, intuitively represent claims that are conjectured to 
be true but have not yet been proved true. These claims are proved using the 
Circularity rule, which is analogous in our coinductive framework to referring back 
to claims previously seen. Most of the other rules in Fig. 6 are not as interesting. 
Transitivity requires progress before the circularities are flushed as axioms. This 
corresponds to the outer stepp in our coinductive framework. 

Clearly, there are obvious parallels between the Reachability Logic proof 
system and our coinductive framework. We have formalized and mechanically 
verified a detailed proof that reachability logic is an instance of our coinductive 
verification framework. One can refer to [16] for full details, but we briefly discuss 
the nature of the proof below. 


5.3 Reachability Logic is Coinduction 


To formalize what it means for reachability logic to be an instance of coinduction, 
we first need some definitions. First, we need a translation from a reachability 
rule to a set of coinductive claims. In a reachability rule y > y’, both y and 
y’ are patterns which respectively describe (symbolically) the starting and the 
reached configurations. Both y and y’ can have free variables. Let Var be the 
set of variables. Then, we define the set of claims 


Spay = {(¢,p(y’)) | c € Ple), Ve: Var = Cfo} 


where Cfg is the model of configurations and p(-) is the extension of the valuation 
p to patterns [15]. Also, let the claims derived from a set of reachability rules 
X= {yı => Plr Pn = grt be: 


In reachability logic, programming language semantics are defined as theo- 
ries, that is, as sets of (one-step) reachability rules A with patterns over a given 
signature of symbols. Each theory A defines a transition relation over the con- 
figurations in Cfg, say Ra, which is then used to define the semantic validity 
in reachability logic, A = y => y’. It is possible and easier to prove our main 
theorem more generally, for any transition relation R that satisfies RET A: 


REt Aif RET y => ọ' for each y> gp'E A 
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where R E+ y => gy’ if for each p : Var — Cfg and y : Cfg such that (p,y) E Y 
[33], there is a y” such that y >p 7’ and (7, p(y’)) is a valid reachability claim. 


Lemma 3. Ry Et A and if Spay C validr, then AF yp => g. 


This lemma suggests what to do: take any reachability logic proof of A H 
y = gy’ and any transition relation R such that R Et A, and produce a coin- 
ductive proof of Sys’ C validg. This gives us not only a procedure to associate 
coinductive proofs to reachability logic proofs, but also an alternative method 
to prove the soundness of reachability logic. This is what we do below: 


Theorem 2. If there is a reachability logic proof derivation for AF p => vy’ and 
a transition relation R such that R =+ A, then Sys. C validr, and in particular 
this holds by applying Theorem 1 to an inclusion C C step p(derived?,(C)). Here, 
derivedr is a particular function satisfying the conditions for G in Theorem 1 
(see [16] for more details), and C is a set of reachability rules consisting of p => ọ' 
along with those reachability rules which appear as conclusions of instances of 
the Circularity proof rule in the proof tree of AF p => y’. 


To prove Theorem 2, we apply the Set Circularity theorem of reachability 
logic [35], which states that any reachability logic claim A F y => y’ is provable 
iff there is some set of claims C such that y > y’ € C and for each y; > y; E€ C 
there is a proof of A Fe pi = p; which does not use the Circularity proof rule. In 
the forward direction, we can take C as defined in the statement of Theorem 2. 
The main idea is to convert proof trees into inclusions of sets of claims: 


Lemma 4. Given a proof derivation of A Fe va = pa which does not use the 
Circularity proof rule (last rule in Fig. 6), if RE A and C is nonempty then 


Sosy, E Stepp(derived;(C)). 


This lemma is proven by strengthening the inclusion into one that can be proven 
by structural induction over the Reachability Logic proof rules besides Circularity. 

Combining this lemma with Set Circularity shows that C = UiS pi>, E 
validz which implies that Sys C validg exactly as desired. We have mecha- 
nized the proofs of Lemmas 3 and 4 in Coq [16]. This is a major result, consti- 
tuting an independent soundness proof for Reachability Logic, and helps demon- 
strate the strength of our coinductive framework, despite its simplicity. More- 
over, this allows proofs done using reachability logic as in [15] to be translated 
to mechanically verified proofs in Coq, immediately allowing foundational veri- 
fication of programs written in any language. 


6 Other Related Work 


Here we discuss work other than reachability logic that is related to our coinduc- 
tive verification system. We discuss commonly used program verifiers, including 
approaches based on operational semantics and Iris [36], an approach with some 
language independence. We also discuss related coinduction schemata. 
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6.1 Current Verification Tools 


A number of prominent tools such as Why [37], Boogie [38,39], and Bedrock 
[24,32] provide program verification for a fixed language, and support other 
languages by translation if at all. For example, Frama-C and Krakatoa, respec- 
tively, attempt to verify C and Java by translation through Why. Also, Spec# 
and Havoc, respectively, verify C# and C by translation through Boogie. We are 
not aware of soundness proofs for these translations. Such proofs would be highly 
non-trivial, requiring formal semantics of both source and target languages. 

All of these systems are based on a verification condition (VC) generator for 
their programming language. Bedrock is closest in architecture and guarantees 
to our system, as it is implemented in Coq and verification results in a Coq 
proof certificate that the specification is sound with respect to a semantics of 
the object language. Bedrock supports dynamically created code, and modular 
verification of higher-order functions, for which our framework has preliminary 
support. Bedrock also makes more aggressive attempts at complete automation, 
which costs increased runtime. Most fundamentally, Bedrock is built around a 
VC generator for a fixed target language. 

In sharp contrast to the above approaches, we demonstrated that a small- 
step operational semantics suffices for program verification, without a need to 
define any other semantics, or verification condition generators, for the same 
language. A language-independent, sound and (relatively) complete coinductive 
proof method then allows us to verify properties of programs using directly 
the operational semantics. As seen in Sect. 4.8 this language independence does 
not compromise other desirable properties. The required human effort and the 
performance of the verification task compare well with foundational program 
verifiers such as Bedrock, and we provide the same high confidence in correctness: 
the trust base consists of the operational semantics only. 


6.2 Operational Semantics Based Approaches 


Verifiable C [40] is a program verification tool for the C programming language 
based on an operational semantics for C defined in Coq. Hoare triples are then 
proved as lemmas about the operational semantics. However, in this approach 
and other similar approaches, it is necessary to prove such lemmas. Without 
them, verification of any nontrivial C program would be nearly impossible. In 
our approach, while we can also define and prove Hoare triples as lemmas, doing 
so is not needed to make program verification feasible, as demonstrated in the 
previous sections. We only need some additional domain reasoning in Coq, which 
logics like Verifiable C require in addition to Hoare logic reasoning. Thus, our 
approach automatically yields a program verification tool for any language with 
minimal additional reasoning, while approaches such as Verifiable C need over 
40,000 lines of Coq to define the program logic. We believe this is completely 
unnecessary, and hope our coinductive framework will be the first step in elimi- 
nating such superfluous logics. 
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The work by the FLINT group [41-43] is another approach to program ver- 
ification based on operational semantics. Languages developed use shallowly 
embedded state predicates in Coq, and inference rules are derived directly from 
the operational semantics. However, their work is not generic over operational 
semantics. For example, [43] is developed in the context of a particular machine 
model, with a fixed memory representation and register file. Even simple changes 
such as adding registers require updating soundness proofs. Our approach has a 
single soundness theorem that can be instantiated for any language. 

Iris [36] is a concurrent separation logic that has language independence, 
with operational semantics formalized in Coq. Iris adds monoids and invariants 
to the program logic in order to facilitate verification. It also derives some Hoare- 
style rules for verification from the semantics of a language. However, there are 
still structural Hoare rules that depend on the language that must be added 
manually. Additionally, once proof rules are generated, they are specialized to 
that particular language. Further, the verification in the paper relies on Hoare 
style reasoning, while in our approach, we do not assume any such verification 
style, as we work directly with the mathematical specifications. Finally, the 
monoids used are not generated and are specific to the program language used. 


6.3 Other Coinduction Schemata 


A categorical generalization of our key theorem was presented as a recursion 
scheme in [12,13]. The titular result of the former is the dual of the A-coiteration 
scheme of the latter, which specializes to preorder categories to give our The- 
orem 1. A more recent and more general result is [14], which also generalized 
other recent work on coinductive proofs such as [44]. Unlike these approaches, 
which were presented for showing bisimilarity, the novelty of our approach stems 
in the use of these techniques directly to show Hoare-style functional correct- 
ness claims, and in the development of the afferent machinery and automa- 
tion that makes it work with a variety of languages, and not in advancing the 
already solid mathematical foundations of coinduction. Various weaker coin- 
duction schemes are folklore, such as Isabelle/HOL’s standard library’s lemma 
coinduct3: mono(f) ^ AC f(pa. f(a) UAUvf) => ACv(f). 


7 Conclusion and Future Work 


We presented a language-independent program verification framework. Proofs 
can be as simple as with a custom Hoare logic, but only an operational semantics 
of the target language is required. We have mechanized a proof of the correctness 
of our approach in Coq. Combining this with a coinductive proof thus produces a 
Coq proof certificate concluding that the program meets the specification accord- 
ing to the provided semantics. Our approach is amenable to proof automation. 
Further automation may improve convenience and cannot compromise sound- 
ness of the proof system. A language designer need only give an authoritative 
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semantics to enable program verification for a new language, rather than needing 
to have the experience and invest the effort to design and prove the soundness 
of a custom program logic. 

One opportunity for future work is using our approach to provide proof cer- 
tificates for reachability logic program verifiers such as K [9]. The K prover was 
used to verify programs in several real programming languages [15]. While the 
proof system is sound, trusting the results of these tools requires trusting the 
implementation of the K system. Our translation in Sect.5 will allow us to pro- 
duce proof objects in Coq for proofs done in K’s backend, which will make it 
sufficient to trust only Coq’s proof checker to rely on the results from K’s prover. 

Another area for future work is verifying programs with higher-order specifi- 
cations, where a specification can make reachability claims about values quanti- 
fied over in the specification. This allows higher-order functions to have specifica- 
tions that require functional arguments to themselves satisfy some specification. 
We have begun preliminary work on proving validity of such specifications using 
the notions of compatibility up-to presented in [14]. Combining this with more 
general forms of claims may allow modular verification of concurrent programs, 
as in RGsep [45]. See [16] for initial work in these areas. 

Other areas for future work are evaluating the reusability of proof automa- 
tion between languages, and using the ability to easily verify programs under a 
modified semantics, e.g. adding time costs to allow proving real-time properties. 
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Abstract. Our increasing dependence on complex and critical informa- 
tion infrastructures and the emerging threat of sophisticated attacks, 
ask for extended efforts to ensure the correctness and security of these 
systems. Byzantine fault-tolerant state-machine replication (BFT-SMR) 
provides a way to harden such systems. It ensures that they maintain 
correctness and availability in an application-agnostic way, provided that 
the replication protocol is correct and at least n — f out of n replicas 
survive arbitrary faults. This paper presents Velisarios, a logic-of-events 
based framework implemented in Coq, which we developed to implement 
and reason about BFT-SMR protocols. As a case study, we present the 
first machine-checked proof of a crucial safety property of an implemen- 
tation of the area’s reference protocol: PBFT. 


Keywords: Byzantine faults - State machine replication 
Formal verification - Coq 


1 Introduction 


Critical information infrastructures such as the power grid or water supply sys- 
tems assume an unprecedented role in our society. On one hand, our lives depend 
on the correctness of these systems. On the other hand, their complexity has 
grown beyond manageability. One state of the art technique to harden such crit- 
ical systems is Byzantine fault-tolerant state-machine replication (BFT-SMR). 
It is a generic technique that is used to turn any service into one that can toler- 
ate arbitrary faults, by extensively replicating the service to mask the behavior 
of a minority of possibly faulty replicas behind a majority of healthy replicas, 
operating in consensus.' The total number of replicas n is a parameter over the 
maximum number of faulty replicas f, which the system is configured to tolerate 


This work is partially supported by the Fonds National de la Recherche Luxembourg 
(FNR) through PEARL grant FNR/P14/8149128. 

1 For such techniques to be useful and in order to avoid persistent and shared vul- 
nerabilities, replicas need to be rejuvenated periodically [17,76], they need to be 
diverse enough [43], and ideally they need to be physically far apart. Diversity and 
rejuvenation are not covered here. 
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at any point in time. Typically, n = 3f + 1 for classical protocols such as in [16], 
and n = 2f +1 for protocols that rely on tamper-proof components such as 
in [82]. Because such protocols tolerate arbitrary faults, a faulty replica is one 
that does not behave according to its specification. For example it can be one 
that is controlled by an attacker, or simply one that contains a bug. 

Ideally, we should guarantee the correctness and security of such replicated 
and distributed, hardened systems to the highest standards known to mankind 
today. That is, the proof of their correctness should be checked by a machine and 
their model refined down to machine code. Unfortunately, as pointed out in [29], 
most distributed algorithms, including BFT protocols, are published in pseudo- 
code or, in the best case, a formal but not executable specification, leaving their 
safety and liveness questionable. Moreover, Lamport, Shostak, and Pease wrote 
about such programs: “We know of no area in computer science or mathematics 
in which informal reasoning is more likely to lead to errors than in the study of 
this type of algorithm.” [54]. Therefore, we focus here on developing a generic 
and extensible formal verification framework for systematically supporting the 
mechanical verification of BFT protocols and their implementations.” 

Our framework provides, among other things, a model that captures the 
idea of arbitrary/Byzantine faults; a collection of standard assumptions to rea- 
son about systems with faulty components; proof tactics that capture common 
reasoning patterns; as well as a general library of distributed knowledge. All 
these parts can be reused to reason about any BFT protocol. For example, most 
BFT protocols share the same high-level structure (they essentially disseminate 
knowledge and vote on the knowledge they gathered), which we capture in our 
knowledge theory. We have successfully used this framework to prove a crucial 
safety property of an implementation of a complex BFT-SMR protocol called 
PBFT [14-16]. We handle all the functionalities of the base protocol, including 
garbage collection and view change, which are essential in practical protocols. 
Garbage collection is used to bound message logs and buffers. The view change 
procedure enables BFT protocols to make progress in case the primary—a dis- 
tinguished replica used in some fault-tolerant protocols to coordinate votes— 
becomes faulty. 


Contributions. Our contributions are as follows: (1) Section3 presents Velisar- 
ios, our continuing effort towards a generic and extensible logic-of-events based 
framework for verifying implementations of BFT-SMR protocols using Coq [25]. 
(2) As discussed in Sect. 4, our framework relies on a library to reason about 
distributed epistemic knowledge. (3) We implemented Castro’s landmark PBFT 
protocol, and proved its agreement safety property (see Sect.5). (4) We imple- 
mented a runtime environment to run the OCaml code we extract from Coq (see 
Sect.6). (5) We released Velisarios and our PBFT safety proof under an open 
source licence.’ 


? Ideally, both (1) the replication mechanism and (2) the instances of the replicated 
service should be verified. However, we focus here on (1), which has to be done only 
once, while (2) needs to be done for every service and for every replica instance. 

3 Available at: https: //github.com/vrahli/Velisarios. 
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Why PBFT? We have chosen PBFT because several BFT-SMR protocols 
designed since then either use (part of) PBFT as one of their main building 
blocks, or are inspired by it, such as [6,8,26,45,46,82], to cite only a few. There- 
fore, a bug in PBFT could imply bugs in those protocols too. Castro provided 
a thorough study of PBFT: he described the protocol in [16], studied how to 
proactively rejuvenate replicas in [14], and provided a pen-and-paper proof of 
PBF'T’s safety in [15,17]. Even though we use a different model—Castro used I/O 
automata (see Sect. 7.1), while we use a logic-of-events model (see Sect. 3)—our 
mechanical proof builts on top of his pen-and-paper proof. One major difference 
is that here we verify actual running code, which we obtain thanks to Coq’s 
extraction mechanism. 


2 PBFT Recap 


This section provides a rundown of PBFT [14-16], which we use as running 
example to illustrate our model of BFT-SMR protocols presented in Sect. 3. 


2.1 Overview of the Protocol 


We describe here the public-key based version of PBFT, for which Castro pro- 
vides a formal pen-and-paper proof of its safety. PBFT is considered the first 
practical BFT-SMR protocol. Compared to its predecessors, it is more efficient 
and it does not rely on unrealistic assumptions. It works with asynchronous, 
unreliable networks (i.e., messages can be dropped, altered, delayed, duplicated, 
or delivered out of order), and it tolerates independent network failures. To 
achieve this, PBFT assumes strong cryptography in the form of collision-resistant 
digests, and an existentially unforgeable signature scheme. It supports any deter- 
ministic state machine. Each state machine replica maintains the service state 
and implements the service operations. Clients send requests to all replicas and 
await f +1 matching replies from different replicas. PBFT ensures that healthy 
replicas execute the same operations in the same order. 

To tolerate up to f faults, PBFT requires |R| = 3f+1 replicas. Replicas move 
trough a succession of configurations called views. In each view v, one replica 
(p = v mod |R|) assumes the role of primary and the others become backups. 
The primary coordinates the votes, i.e., it picks the order in which client requests 
are executed. When a backup suspects the primary to be faulty, it requests a 
view-change to select another replica as new primary. 


Normal-Case. During normal-case operation, i.e., when the primary is not sus- 
pected to be faulty by a majority of replicas, clients send requests to be executed, 
which trigger agreement among the replicas. Various kinds of messages have to be 
sent among clients and replicas before a client knows its request has been exe- 
cuted. Figure 1 shows the resulting message patterns for PBFT’s normal-case 
operation and view-change protocol. Let us discuss here normal-case operation: 
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Fig. 1. PBFT normal-case (left) and view-change (right) operations 


1. Request: To initiate agreement, a client c sends a request of the form 
(REQUEST, 0, t,c)>, to the primary, but is also prepared to broadcast it to 
all replicas if replies are late or primaries change. (REQUEST, 0, t,c),, specifies 
the operation to execute o and a timestamp t that orders requests of the same 
client. Replicas will not re-execute requests with a lower timestamp than the 
last one processed for this client, but are prepared to resend recent replies. 

2. Pre-prepare: The primary of view v puts the pending requests in a total order 
and initiates agreement by sending (PRE-PREPARE, v, n, M)o, to all the back- 
ups, where m should be the nt” executed request. The strictly monotonically 
increasing and contiguous sequence number n ensures preservation of this 
order despite message reordering. 

3. Prepare: Backup i acknowledges the receipt of a pre-prepare message by send- 
ing the digest d of the client’s request in (PREPARE, v, n, d, i)o, to all replicas. 

4. Commit: Replica i acknowledges the reception of 2f prepares matching a 
valid pre-prepare by broadcasting (COMMIT, v, n, d, i)o;. In this case, we say 
that the message is prepared at i. 

5. Execution & Reply: Replicas execute client operations after receiving 2f +1 
matching commits, and follow the order of sequence numbers for this exe- 
cution. Once replica i has executed the operation o requested by client c, it 
sends (REPLY, v, t, c, i, r)}o; to c, where r is the result of applying o to the ser- 
vice state. Client c accepts r if it receives f +1 matching replies from different 
replicas. 


Client and replica authenticity, and message integrity are ensured through 
signatures of the form (m),,. A replica accepts a message m only if: (1) m’s 
signature is correct, (2) m’s view number matches the current view, and (3) the 
sequence number of m is in the water mark interval (see below). 

PBFT buffers pending client requests, processing them later in batches. 
Moreover, it makes use of checkpoints and water marks (which delimit sequence 
number intervals) to limit the size of all message logs and to prevent replicas 
from exhausting the sequence number space. 


Garbage Collection. Replicas store all correct messages that were created or 
received in a log. Checkpoints are used to limit the number of logged messages 
by removing the ones that the protocol no longer needs. A replica starts check- 
pointing after executing a request with a sequence number divisible by some 
predefined constant, by multicasting the message (CHECKPOINT, v, n, d, i)o; to all 
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other replicas. Here n is the sequence number of the last executed request and 
d is the digest of the state. Once a replica received f + 1 different checkpoint 
messages* (possibly including its own) for the same n and d, it holds a proof of 
correctness of the log corresponding to d, which includes messages up to sequence 
number n. The checkpoint is then called stable and all messages lower than n 
(except view-change messages) are pruned from the log. 


View Change. The view change procedure ensures progress by allowing replicas 
to change the leader so as to not wait indefinitely for a faulty primary. Each 
backup starts a timer when it receives a request and stops it after the request has 
been executed. Expired timers cause the backup to suspect the leader and request 
a view change. It then stops receiving normal-case messages, and multicasts 
(VIEW-CHANGE, v + 1,n,s,C, P, i)o;, reporting the sequence number n of the last 
stable checkpoint s, its proof of correctness C, and the set of messages P with 
sequence numbers greater than n that backup i prepared since then. When the 
new primary p receives 2 f+1 view-change messages, it multicasts (NEW-VIEW, v+ 
1, V,O, N}o,, where V is the set of 2f + 1 valid view-change messages that p 
received; O is the set of messages prepared since the latest checkpoint reported 
in V; and N contains only the special null request for which the execution is a 
no-op. N is added to the O set to ensure that there are no gaps between the 
sequence numbers of prepared messages sent by the new primary. Upon receiving 
this new-view message, replicas enter view v + 1 and re-execute the normal-case 
protocol for all messages in OU N. 

We have proved a critical safety property of PBFT, including its garbage 
collection and view change procedures, which are essential in practical protocols. 
However, we have not yet developed generic abstractions to specifically reason 
about garbage collection and view changes, that can be reused in other protocols, 
which we leave as future work. 


2.2 Properties 


PBFT with |R| = 3f+1 replicas is safe and live. Its safety boils down to lineariz- 
ability [42], i.e., the replicated service behaves like a centralized implementation 
that executes operations atomically one at a time. Castro used a modified ver- 
sion of linearizability in [14] to deal with faulty clients. As presented in Sect. 5, 
we proved the crux of this property, namely the agreement property (we leave 
linearizability for future work). 

As informally explained by Castro [14], assuming weak synchrony (which 
constrains message transmission delays), PBF'T is live, i.e., clients will eventually 
receive replies to their requests. In the future, we plan to extend Velisarios to 
support liveness and mechanize PBFT’s liveness proof. 


* Castro first required 2f + 1 checkpoint messages [16] but relaxed this requirement 
in [14]. 
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2.3 Differences with Castro’s Implementation 


As mentioned above, besides the normal-case operation, our Coq implementa- 
tion of PBFT handles garbage collection, view changes and request batching. 
However, we slightly deviated from Castro’s implementation [14], primarily in 
the way checkpoints are handled: we always work around sending messages that 
are not between the water marks, and a replica always requires its own check- 
point before clearing its log. Assuming the reader is familiar with PBFT, we now 
detail these deviations and refer the reader to [14] for comparison. 


(1) To the best of our knowledge, to ensure liveness, Castro’s implementation 
requires replicas to resend prepare messages below the low water mark when 
adopting a new-view message and processing the pre-prepares in OU N. In 
contrast, our implementation never sends messages with sequence numbers 
lower than the low water mark. This liveness issue can be resolved by bring- 
ing late replicas up to date through a state transfer. 

(2) We require a new leader to send its own view-change message updated with 
its latest checkpoint as part of its new-view message. If not, it may happen 
that a checkpoint stabilizes after the view-change message is sent and before 
the new-view message is prepared. This might result in a new leader sending 
messages in OU N with a sequence number below its low water mark, which 
it avoids by updating its own view-change message to contain its latest 
checkpoint. 

(3) We require replicas to wait for their own checkpoint message before sta- 
bilizing a checkpoint and garbage collecting logs. This avoids stabilizing a 
checkpoint that has not been computed locally. Otherwise, a replica could 
lose track of the last executed request if its sequence number is superseded 
by the one in the checkpoint. Once proven, a state transfer of the latest 
checkpoint state and an update of the last executed request would also 
resolve this point. 


We slightly deviated from Castro’s protocol to make our proofs go through. 
We leave it for future work to formally study whether we could do without these 
changes, or whether they are due to shortcomings of the original specification. 


3 Velisarios Model 


Using PBFT as a running example, we now present our Coq model for Byzan- 
tine fault-tolerant distributed systems, which relies on a logic of events—Fig. 2 
outlines our formalization. 


3.1 The Logic of Events 


We adapt the Logic of Events (LoE) we used in EventML [9,11,71] to not 
only deal with crash faults, but arbitrary faults in general (including malicious 
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Fig. 2. Outline of formalization 


faults). LoE, related to Lamport’s notion of causal order [53] and to event struc- 
tures [60,65], was developed to reason about events occurring in the execution 
of a distributed system. LoE has recently been used to verify consensus pro- 
tocols [71,73] and cyber-physical systems [3]. Another standard model of dis- 
tributed computing is Chandy and Lamport’s global state semantics [19], where 
a distributed system is modeled as a single state machine: a state is the collection 
of all processes at a given time, and a transition takes a message in flight and 
delivers it to its recipient (a process in the collection). Each of these two models 
has advantages and disadvantages over the other. We chose LoE because in our 
experience it corresponds more closely to the way distributed system researchers 
and developers reason about protocols. As such, it provides a convenient com- 
munication medium between distributed systems and verification experts. 

In LoE, an event is an abstract entity that corresponds either (1) to the 
handling of a received message, or (2) to some arbitrary activity about which 
no information is provided (see the discussion about trigger in Sect. 3.4). We use 
those arbitrary events to model arbitrary/Byzantine faults. An event happens at 
a specific point in space/time: the space coordinate of an event is called its loca- 
tion, and the time coordinate is given by a well-founded ordering on events that 
totally orders all events at the same location. Processes react to the messages that 
triggered the events happening at their locations one at a time, by transitioning 
through their states and creating messages to send out, which in turn might trig- 
ger other events. In order to reason about distributed systems, we use the notion 
of event orderings (see Sect.3.4), which essentially are collections of ordered 
events and represent runs of a system. They are abstract entities that are never 
instantiated. Rather, when proving a property about a distributed system, one 
has to prove that the property holds for all event orderings corresponding to all 
possible runs of the system (see Sects. 3.5 and 5 for examples). Some runs/event 
orderings are not possible and therefore excluded through assumptions, such as 
the ones described in Sect.3.6. For example, exists_at_most_f_faulty excludes 
event orderings where more than f out of n nodes could be faulty. 

In the next few sections, we explain the different components (messages, 
authentication, event orderings, state machines, and correct traces) of Velisarios, 
and their use in our PBFT case study. Those components are parameterized 
by abstract types (parameters include the type of messages and the kind of 
authentication schemes), which we later have to instantiate in order to reason 
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about a given protocol, e.g. PBFT, and to obtain running code. The choices we 
made when designing Velisarios were driven by our goal to generate running code. 
For example, we model cryptographic primitives to reason about authentication. 


3.2 Messages 


Model. Some events are caused by messages of type msg, which is a parameter 
of our model. Processes react to messages to produce message/destinations pairs 
(of type DirectedMsg), called directed messages. A directed message is typically 
handled by a message outbox, which sends the message to the listed destina- 
tions.” A destination is the name (of type name, which is a parameter of our 
model) of a node participating in the protocol. 


PBFT. In our PBFT implementation, we instantiate the msg type using the 
following datatype (we only show some of the normal-case operation messages, 
leaving out for example the more involved pre-prepare messages—see Sect. 2.1): 


Inductive PBFTmsg := Inductive Bare_Prepare := 

| REQUEST (r : Request) | bare_prepare (v : View) (n : SeqNum) (d : digest) (i : Rep). 
| PREPARE (p : Prepare) Inductive Prepare := 

| REPLY (r : Reply) ... | prepare (b : Bare_Prepare) (a : list Token). 


As for prepares, all messages are defined as follows: we first define bare messages 
that do not contain authentication tokens (see Sect. 3.3), and then authenticated 
messages as pairs of a bare message and an authentication token. Views and 
sequence numbers are nats, while digests are parameters of the specification. 
PBFT involves two types of nodes: replicas of the form PBFTreplica(7), where r 
is of type Rep; and clients of the form PBFTclient(c), where c is of type Client. 
Both Rep and Client are parameters of our formalization, such that Rep is of 
arity 3f+1, where f is a parameter that stands for the number of tolerated faults. 


3.3 Authentication 


Model. Our model relies on an abstract concept of keys, which we use to imple- 
ment and reason about authenticated communication. Capturing authenticity at 
the level of keys allows us to talk about impersonation through key leakage. Keys 
are divided into sending keys (of type sending_key) to authenticate a message 
for a target node, and receiving keys (of type receiving_key) to check the valid- 
ity of a received message. Both sending_key and receiving_key are parameters 
of our model.® Each node maintains local keys (of type local_keys), which con- 
sists of two lists of directed keys: one for sending keys and one for receiving keys. 
Directed keys are pairs of a key and a list of node names identifying the processes 
that the holder of the key can communicate with. 


5 Message inboxes/outboxes are part of the runtime environment but not part of the 
model. 

6 Sending and receiving keys must be different when using asymmetric cryptography, 
and can be the same when using symmetric cryptography. 
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Sending keys are used to create authentication tokens of type Token, which we 
use to authenticate messages. Tokens are parameters of our model and abstract 
away from concrete concepts such as digital signatures or MACs. Typically, 
a message consists of some data plus some tokens that authenticates the data. 
Therefore, we introduce the following parameters: (1) the type data, for the kind 
of data that can be authenticated; (2) a create function to authenticate some 
data by generating authentication tokens using the sending keys; and (3) a verify 
function to verify the authenticity of some data by checking that it corresponds 
to some token using the receiving keys. 

Once some data has been authenticated, it is typically sent over the net- 
work to other nodes, which in turn need to check the authenticity of the data. 
Typically, when a process sends an authenticated message to another process it 
includes its identity somewhere in the message. This identity is used to select the 
corresponding receiving key to check the authenticity of the data using verify. To 
extract this claimed identity we require users to provide a data_sender function. 

It often happens in practice that a message contains more than one 
piece of authenticated data (e.g., in PBFT, pre-prepare messages contain 
authenticated client requests). Therefore, we require users to provide a 
get_contained_auth_data function that extracts all authenticated pieces of data 
contained in a message. Because we sometimes want to use different tokens to 
authenticate some data (e.g., when using MACs), an authenticated piece of data 
of type auth_data is defined as a pair of: (1) a piece of data, and (2) a list of 
tokens. 


PBFT. Our PBFT implementation leaves keys and authentication tokens 
abstract because our safety proof is agnostic to the kinds of these elements. 
However, we turn them into actual asymmetric keys when extracting OCaml 
code (see Sect. 6 for more details). The create and verify functions are also left 
abstract until we extract the code to OCaml. Finally, we instantiate the data 
(the objects that can be authenticated, i.e., bare messages here), data_sender, 
and get_contained_auth_data parameters using: 


Inductive PBFTdata := | PBFTdata_request (r : Bare_Request) 
PBFTdata_prepare (p : Bare_Prepare) | PBFTdata_reply (r : Bare_Reply) ... 


Definition PBFTdata_sender (m : data) : option name := match m with 
PBFTdata_request (bare_request o t c) = Some (PBFTclient c) 
PBFTdata_prepare (bare_prepare v n d i) = Some (PBFTreplica i) 
PBFTdata_reply (bare_reply v t c i r) = Some (PBFTreplica i)... 


Definition PBFTget_contained_auth_data (m : msg) : list auth-data := match m with 
REQUEST (request b a) => [(PBFTdata-request b,a)] 

PREPARE (prepare b a) = [(PBFTdata_prepare b,a)] 

REPLY (reply b a) = [(PBFTdata-reply b,a)] ... 
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3.4 Event Orderings 


A typical way to reason about a distributed system is to reason about its pos- 
sible runs, which are sometimes modeled as execution traces [72], and which 
are captured in LoE using event orderings. An event ordering is an abstract 
representation of a run of a distributed system; it provides a formal definition 
of a message sequence diagram as used by system designers (see for example 
Fig. 1). As opposed to [72], a trace here is not just one sequence of events but 
instead can be seen as a collection of local traces (one local trace per sequen- 
tial process), where a local trace is a collection of events all happening at the 
same location and ordered in time, and such that some events of different local 
traces are causally ordered. Event orderings are never instantiated. Instead, we 
express system properties as predicates on event orderings. A system satisfies 
such a property if every possible execution of the system satisfies the predicate. 
We first formally define the components of an event ordering, and then present 
the axioms that these components have to satisfy. 


Components. An event ordering is formally defined as the tuple:’ 


Class EventOrdering := 


{ Event : Type; happenedBefore : Event — Event — Prop; 
loc : Event — name; direct_pred : Event — option Event; 
trigger : Event — option msg; keys : Event — local_keys; } 


where (1) Event is an abstract type of events; (2) happenedBefore is an order- 
ing relation on events; (3) loc returns the location at which events happen; 
(4) direct_pred returns the direct local predecessor of an event when one exists, 
i.e., for all events except initial events; (5) given an event e, trigger either returns 
the message that triggered e, or it returns None to indicate that no information 
is available regarding the action that triggered the event (see below); (6) keys 
returns the keys a node can use at a given event to communicate with other 
nodes. The event orderings presented here are similar to the ones used in [3,71], 
which we adapted to handle Byzantine faults by modifying the type of trigger 
so that events can be triggered by arbitrary actions and not necessarily by the 
receipt of a message, and by adding support for authentication through keys. 

The trigger function returns None to capture the fact that nodes can some- 
times behave arbitrarily. This includes processes behaving correctly, i.e., accord- 
ing to their specifications; as well as (possibly malicious) processes deviating from 
their specifications. Note that this does not preclude from capturing the behavior 
of correct processes because for all event orderings where trigger returns None 
for an event where the node behaved correctly, there is a similar event ordering, 
where trigger returns the triggering message at that event. To model that at most 
f nodes out of n can be faulty we use the exists_at_most_f_faulty assumption, 
which enforces that trigger returns None at most f nodes. 

Moreover, even though non-syntactically valid messages do not trigger events 
because they are discarded by message boxes, a triggering message could be 


T A Coq type class is essentially a dependent record. 
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syntactically valid, but have an invalid signature. Therefore, it is up to the pro- 
grammer to ensure that processes only react to messages with valid signatures 
using the verify function. Our authenticated_messages_were_sent_non_byz and 
exists_at_most_f_faulty assumptions presented in Sect. 3.6 are there to constrain 
trigger to ensure that at most f nodes out of n can diverge from their specifica- 
tions, for example, by producing valid signatures even though they are not the 
nodes they claim to be (using leaked keys of other nodes). 


Axioms. The following axioms characterize the behavior of these components: 


1. Equality between events is decidable. Events are abstract entities that corre- 
spond to points in space/time that can be seen as pairs of numbers (one for 
the space coordinate and one for the time coordinate), for which equality is 
decidable. 

2. The happened before relation is transitive and well-founded. This allows us 
to prove properties by induction on causal time. We assume here that it is 
not possible to infinitely go back in time, i.e., that there is a beginning of 
(causal) time, typically corresponding to the time a system started. 

3. The direct predecessor e2 of e} happens at the same location and before e1. 
This makes local orderings sub-orderings of the happenedBefore ordering. 

4. If an event e does not have a direct predecessor (i.e., e is an initial event) 
then there is no event happening locally before e. 

5. The direct predecessor function is injective, i.e., two different events cannot 
have the same direct predecessor. 

6. If an event eı happens locally before ez and e is the direct predecessor of e2, 
then either e = e1 or eı happens before e. From this, it follows that the direct 
predecessor function can give us the complete local history of an event. 


Notation. We use a < b to stand for (happenedBefore a b); a < b to stand 
for (a < b or a=b); and a E b to stand for (a < b and loc a=loc b). We also 
sometimes write EO instead of EventOrdering. 

Some functions take an event ordering as a parameter. For readability, we 
sometimes omit those when they can be inferred from the context. Similarly, we 
will often omit type declarations of the form (T : Type). 


Correct Behavior. To prove properties about distributed systems, one only 
reasons about processes that have a correct behavior. To do so we only reason 
about events in event orderings that are correct in the sense that they were 
triggered by some message: 


Definition isCorrect (e : Event) := match trigger e with Some m = True | None = False end. 
Definition arbitrary (e : Event) := ~ isCorrect e. 


Next, we characterize correct replica histories as follows: (1) First we say that 
an event e has a correct trace if all local events prior to e are correct. (2) Then, 
we say that a node 7 has a correct trace before some event e, not necessarily 
happening at i, if all events happening before e at i have a correct trace: 
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2 


Definition has_correct_bounded-_trace (e : Event) := forall e’, e 
Definition has_correct_trace_before (e : Event) (i : name) := 
forall e’, e’ x e — loc e’ = i — has-_correct_bounded_trace e’. 


C e — isCorrect e’. 


3.5 Computational Model 


Model. We now present our computational model, which we use when extract- 
ing OCaml programs. Unlike in EventML [71] where systems are first specified as 
event observers (abstract processes), and then later refined to executable code, 
we skip here event observers, and directly specify systems using executable state 
machines, which essentially consist of an update function and a current state. 
We define a system of distributed state machines as a function that maps names 
to state machines. Systems are parametrized by a function that associates state 
types with names in order to allow for different nodes to run different machines. 


Definition Update S I O := S — I — (option S * O). 
Record StateMachine S I O := MkSM { halted : bool; update : Update S I O; state: S }. 
Definition System (F : name — Type) J O := forall (i : name), StateMachine (F i) I O. 


where S is the type of the machine’s state, I/O are the input/output types, and 
halted indicates whether the state machine is still running or not. 

Let us now discuss how we relate state machines and events. We define 
state_sm_before_event and state_sm_after_event that compute a machine’s state 
before and after a given event e. These states are computed by extracting the 
local history of events up to e using direct_pred, and then updating the state 
machine by running it on the triggering messages of those events. These func- 
tions return None if some arbitrary event occurs or the machine halts some- 
time along the way. Otherwise they return Some s, where s is the state of the 
machine updated according to the events. Therefore, assuming they return Some 
amounts to assuming that all events prior to e are correct, i.e., we can prove that 
if state_sm_after_event sm e = Some s then has_correct_trace_before e (loc e). 
As illustrated below, we use these functions to adopt a Hoare-like reasoning 
style by stating pre/post-conditions on the state of a process prior and after 
some event. 


PBFT. We implement PBFT replicas as state machines, which we derive from 
an update function that dispatches input messages to the corresponding han- 
dlers. Finally, we define PBF Tsys as the function that associates PBFTsm with 
replicas and a halted machine with clients (because we do not reason here about 
clients). 


Definition PBFTupdate (i : Rep) := fun state msg = match msg with 

| REQUEST r = PBFThandle_request i state r 

| PREPARE p => PBFThandle_prepare i state p... 
Definition PBFTsm (i : Rep) := MkSM false (PBF Tupdate 7) (initial_state t). 
Definition PBFTsys := fun name > match name with 

| PBFTreplica i = PBFTsm i | PBFTclient c > haltedSM end. 
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Let us illustrate how we reason about state machines through a simple exam- 
ple that shows that they maintain a view that only increases over time. It shows 
a local property, while Sect.5 presents the distributed agreement property that 
makes use of the assumptions presented in Sect.3.6. As mentioned above we 
prove such properties for all possible event orderings, which means that they are 
true for all possible runs of the system. In this lemma, s/ is the state prior to 
the event e, and s2 is the state after handling e. It does not have pre-conditions, 
and its post-condition states that the view in s/ is smaller than the view in s2. 


Lemma current_view_increases : forall (eo : EO) (e : Event) i s1 s2, 
state_sm_before_event (PBFTsm i) e = Some s1 
— state_sm-_after_event (PPBFTsm i) e = Some s2 
— current_view si < current_view s2. 


3.6 Assumptions 


Model. Let us now turn to the assumptions we make regarding the network 
and the behavior of correct and faulty nodes. 


Assumption 1. Proving safety properties of crash fault-tolerant protocols that 
only require reasoning about past events, such as agreement, does not require 
reasoning about faults and faulty replicas. To prove such properties, one merely 
has to follow the causal chains of events back in time, and if a message is received 
by a node then it must have been sent by some node that had not crashed at 
that time. The state of affairs is different when dealing with Byzantine faults. 

One issue it that Byzantine nodes can deviate from their specifications or 
impersonate other nodes. However, BFT protocols are designed in such a way 
that nodes only react to collections of messages, called certificates, that are larger 
than the number of faults. This means that there is always at least one correct 
node that can be used to track down causal chains of events. 

A second issue is that, in general, we cannot assume that some received 
message was sent as such by the designated (correct) sender of the mes- 
sage because messages can be manipulated while in flight. As captured by 
the authenticated_messages_were_sent_or_byz predicate defined below,® we can 
only assume that the authenticated parts of the received message were actu- 
ally sent by the designated senders, possibly inside larger messages, provided 
the senders did not leak their keys. As usual, we assume that attackers cannot 
break the cryptographic primitives, i.e., that they cannot authenticate messages 
without the proper keys [14]. 


1.Definition authenticated_messages_were_sent_or_byz (P : AbsProcess) := 
2. forall e (a: auth_data), 

3. In a (bind_op_list get_contained_auth_data (trigger e)) 

4. — verify_auth_data (loc e) a (keys e) = true 


8 For readability, we show a slightly simplified version of this axiom. The full 
axiom can be found in https://github.com/vrahli/Velisarios/blob/master/model/ 
EventOrdering.v. 
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j. — exists e’, €’ < e A am_auth a = authenticate (am_data a) (keys e’) 
6. A ( (exists dst m, 
7. In a (get_-contained_auth_data m) A In (m,dst) (P eo e’) 


8. A data_sender (loc e) (am_data a) = Some (loc e’)) 

9. Vv 

10. (exists e”, 

11. e” < e’ A arbitrary e’ A arbitrary e” A got-key-for (loc e) (keys e”) (keys e’) 
12. A data_sender (loc e) (am_data a) = Some (loc e”)) ). 


This assumption says that if the authenticated piece of data a is part of the 
message that triggered some event e (L.3), and a is verified (L.4), then there 
exists a prior event e’ such that the data was authenticated while handling e’ 
using the keys available at that time (1.5). Moreover, (1) either the sender of 
the data was correct while handling e’ and sent the data as part of a message 
following the process described by P (L.6-8); or (2) the node at which e’ occurred 
was Byzantine at that time, and either it generated the data itself (e.g. when 
e”=e’), or it impersonated some other replica (by obtaining the keys that some 
node leaked at event e”) (L.10-12). 

We used a few undefined abstractions in this predicate: An AbsProcess 
is an abstraction of a process, i.e., a function that returns the collection 
of messages generated while handling a given event: (forall (eo : EO) 
(e : Event), list DirectedMsg). The bind_op-_list function is wrapped around 
get_contained_auth_data to handle the fact that trigger might return None, 
in which case bind_op_list returns nil. The verify_auth_data function takes an 
authenticated message a and some keys and: (1) invokes data_sender (defined 
in Sect.3.3) to extract the expected sender s of a; (2) searches among its keys 
for a receiving_key that it can use to verify that s indeed authenticated a; and 
(3) finally verifies the authenticity of a using that key and the verify function. 
The authenticate function simply calls create and uses the sending keys to create 
tokens. The got_key_for function takes a name 7 and two local_keys /k1 and [k2, 
and states that the sending keys for i in Ik are all included in [k2. 

However, it turns out that because we never reason about faulty nodes, we 
never have to deal with the right disjunct of the above formula. Therefore, this 
assumption about received messages can be greatly simplified when we know 
that the sender is a correct replica, which is always the case when we use this 
assumption because BFT protocols as designed so that there is always a correct 
node that can be used to track down causal chains of events. We now define 
the following simpler assumption, which we have proved to be a consequence of 
authenticated_messages_were_sent_or_byz: 


Definition authenticated_messages_were_sent_non_byz (P : AbsProcess) := 
forall (e : Event) (a : auth_data) (c : name), 
In a (bind_op-list get_contained_auth_data (trigger e)) 
— has_correct_trace_before e c 
— verify_auth_data (loc e) a (keys e) = true 
— data_sender (loc e) (am_data a) = Some c 
— exists e’ dst m, e’ <x eA loce’=c. 
A am_auth a = authenticate (am_data a) (keys e’) 
A In a (get_contained_auth_data m) 
A In (m,dst) (P eo e’) 
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As opposed to the previous formula, this one assumes that the authenticated data 
was sent by a correct replica, which has a correct trace prior to the event e—the 
event when the message containing a was handled. 


Assumption 2. Because processes need to store their keys to sign and verify mes- 
sages, we must connect those keys to the ones in the model. We do this through 
the correct_keys assumption, which states that for each event e, if a process has 
a correct trace up to e, then the keys (keys e) from the model are the same as 
the ones stored in its state (which are computed using state_sm_before_event). 


Assumption 3. Finally, we present our assumption regarding the number of 
faulty nodes. There are several ways to state that there can be at most f faulty 
nodes. One simple definition is (where node is a subset of name as discussed in 
Sect. 4.2): 


Definition exists_at_most_f_faulty (E : list Event) (f : nat) := 
exists (faulty : list node), length faulty < f 
A forall el e2, In e2 E > el < e2 + ~ In (loc e1) faulty 
— has-_correct_bounded-trace e1. 


This assumption says that at most f nodes can be faulty by stating that the 
events happening at nodes that are not in the list of faulty nodes faulty, of 
length f, are correct up to some point characterized by the partial cut E of a 
given event ordering (i.e., the collection of events happening before those in £). 


PBFT Assumption 4. In addition to the ones above, we made further assump- 
tions about PBFT. Replicas sometimes send message hashes instead of sending 
the entire messages. For example, pre-prepare messages contain client requests, 
but prepare and commit messages simply contain digests of client requests. Con- 
sequently, our PBFT formalization is parametrized by the following create and 
verify functions, and we assume that the create function is collision resistant:° 


Class PBFThash := MkPBFThash { 
create_hash : list PBF Tmsg — digest; verify_hash : list PBF Tmsg — digest — bool; }. 
Class PBFThash_axioms := MkPBFThash_axioms { 
create_hash_collision_resistant : 
forall msgs1 msgs2, create_hash msgs1 = create_hash msgs2 — msgs1 = msgs2; }. 


The version of PBFT, called PBFT-PK in [14], that we implemented relies 
on digital signatures. However, we did not have to make any more assumptions 
regarding the cryptographic primitives than the ones presented above, and in 
particular we did not assume anything that is true about digital signatures and 
false about MACs. Therefore, our safety proof works when using either digital 
signatures or MAC vectors. As discussed below, this is true because we adapted 
the way messages are verified (we have not verified the MAC version of PBFT 
but a slight variant of PBFT-PK) and because we do not deal with liveness. 


°? Note that our current collision resistant assumption is too strong because it is always 
possible to find two distinct messages that are hashed to the same hash. We leave it 
to future work to turn it into a more realistic probabilistic assumption. 
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As Castro showed [14, Chap.3], PBFT-PK has to be adapted when digital 
signatures are replaced by MAC vectors. Among other things, it requires “sig- 
nificant and subtle changes to the view change protocol” [14, Sect. 3.2]. Also, to 
the best of our knowledge, in PBFT-PK backups do not check the authenticity 
of requests upon receipt of pre-prepares. They only check the authenticity of 
requests before executing them [14, p. 42]. This works when using digital sig- 
natures but not when using MACs: one backup might not execute the request 
because its part of the MAC vector does not check out, while another backup 
executes the request because its part of the MAC vector checks out, which would 
lead to inconsistent states and break safety. Castro lists other problems related 
to liveness. 

Instead, as in the MAC version of PBFT [14, p. 42], in our implementation 
we always check requests’ validity when checking the validity of a pre-prepare. If 
we were to check the validity of requests only before executing them, we would 
have to assume that two correct replicas would either both be able to verify 
the data, or both would not be able to do so. This assumption holds for digital 
signatures but not for MAC vectors. 


4 Methodology 


Because distributed systems are all about exchanging information among nodes, 
we have developed a theory that captures abstractions and reasoning patterns to 
deal with knowledge dissemination (see Sect. 4.4). In the presence of faulty nodes, 
one has to ensure that this knowledge is reliable. Fault-tolerant state-machine 
replication protocols provide such guarantees by relying on certificates, which 
ensure that we can always get hold of a correct node to trace back information 
through the system. This requires reasoning about the past, i.e., reasoning by 
induction on causal time using the happenedBefore relation. 


4.1 Automated Inductive Reasoning 


We use induction on causal time to prove both distributed and local proper- 
ties. As discussed here, we automated the typical reasoning pattern we use to 
prove local properties. As an example, in our PBFT formalization, we proved 
the following local property: if a replica has a prepare message in its log, then 
it either received or generated it. Moreover, as for any kinds of programs, using 
Velisarios we prove local properties about processes by reasoning about all pos- 
sible paths they can take when reacting upon messages. Thus, a typical proof of 
such a lemma using Velisarios goes as follows: (1) we go by induction on events; 
(2) we split the code of a process into all possible execution paths; (3) we prune 
the paths that could not happen because they invalidate some hypotheses of the 
lemma being proved; and (4) we automatically prove some other cases by induc- 
tion hypothesis. We packaged this reasoning as a Coq tactic, which in practice 
can significantly reduce the number of cases to prove, and used this automation 
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technique to prove local properties of PBFT, such as Castro’s A.1.2 local invari- 
ants [14]. Because of PBFT’s complexity, our Coq tactic typically reduces the 
number of cases to prove from between 50 to 60 cases down to around 7 cases, 
sometimes less, as we show in this histogram of goals left to interactively prove 
after automation: 


# of goals left to prove|0|1|/2 > 3/4/5/6)/7 
# of lemmas 8/1)5 4/4/2/9/17)3 


4.2 Quorums 


As usual, we use quorum theory to trace back correct information between nodes. 
A (Byzantine) quorum w.r.t. a given set of nodes N, is a subset Q of N, such 
that f +1 <(2*|Q|) —|N| (where |X| is the size of X), i.e. every two quorums 
intersect [59,83] in sufficiently many replicas.'° Typically, a quorum corresponds 
to a majority of nodes that agree on some property. In case of state machine 
replication, quorums are used to ensure that a majority of nodes agree to update 
the state using the same operation. If we know that two quorums intersect, then 
we know that both quorums agree, and therefore that the states cannot diverge. 
In order to reason about quorums, we have proved the following general lemma:!4 


Lemma overlapping_quorums : 
forall (l1 l2 : NRlist node), exists Correct, 
(length l4 + length 12) - num_nodes < length Correct 
A subset Correct l1 A subset Correct 12 A no_repeats Correct. 


This lemma implies that if we have two sets of nodes J/ and l2 (NRlist ensures 
that the sets have no repeats), such that the sum of their length is greater than 
the total number of nodes (num_nodes), there must exist an overlapping subset 
of nodes (Correct). We use this result below in Sect. 4.4. 

The node type parameter is the collection of nodes that can participate in 
quorums. For example, PBFT replicas can participate in quorums but clients 
cannot. This type comes with a node2name function to convert nodes into names. 


4.3 Certificates 


Lemmas that require reasoning about several replicas are much more complex 
than local properties. They typically require reasoning about some information 
computed by a collection of replicas (such as quorums) that vouch for the infor- 
mation. In PBFT, a collection of 2f +1 messages from different replicas is called 


10 We use here Castro’s notation where quorums are majority quorums [79] (also called 
write quorums) that require intersections to be non-empty, as opposed to read quo- 
rums that are only required to intersect with write quorums [36]. 

11 We present here a simplified version for readability. 
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a strong (or quorum) certificate, and a collection of f +1 messages from different 
replicas is called a weak certificate. 

When working with strong certificates, one typically reasons as follows: 
(1) Because PBFT requires 3f + 1 replicas, two certificates of size 2f +1 always 
intersect in f +1 replicas. (2) One message among those f +1 messages must be 
from a correct replica because at most f replicas can be faulty. (3) This correct 
replica can vouch for the information of both quorums—we use that replica to 
trace back the corresponding information to the point in space/time where/when 
it was generated. We will get back to this in Sect. 4.4. 

When working with weak certificates, one typically reasons as follows: 
Because, the certificate has size f + 1 and there are at most f faulty nodes, 
there must be one correct replica that can vouch for the information of the 
certificate. 


4.4 Knowledge Theory 


Model. Let us now present an excerpt of our distributed epistemic knowledge 
library. Knowledge is a widely studied concept [10,30,31,37-39,70]. It is often 
captured using possible-worlds models, which rely on Kripke structures: an agent 
knows a fact if that fact is true in all possible worlds. For distributed systems, 
agents are nodes and a possible world at a given node is essentially one that has 
the same local history as the one of the current world, i.e., it captures the current 
state of the node. As Halpern stresses, e.g. in [37], such a definition of knowl- 
edge is external in the sense that it cannot necessarily be computed, though some 
work has been done towards deriving programs from knowledge-based specifica- 
tions [10]. We follow a different, more pragmatic and computational approach, 
and say that a node knows some piece of data if it is stored locally, as opposed to 
the external and logical notion of knowing facts mentioned above. This computa- 
tional notion of knowledge relies on exchanging messages to propagate it, which 
is what is required to derive programs from knowledge-based specifications (i.e., 
to compute that some knowledge is gained [20,37]). 

We now extend the model presented in Sect.3 with two epistemic modal 
operators know and learn that express what it means for a process to know 
and learn some information, and which bear some resemblance with the fact 
discovery and fact publication notions discussed in [38]. Formally, we extend our 
model with the following parameters, which can be instantiated as many times 
as needed for all the pieces of known/learned data that one wants to reason 
about—see below for examples: 


Class LearnAndKnow := MkLearnAndKnow { 


lak_data : Type; lak_data2info : lak_data — lak_info; 
lak_info : Type; lak_know : lak_data — lak_memory — Prop; 
lak_memory : Type; lak_data2owner : lak_data — node; 


lak_data2auth : lak_data — auth_data; }. 


The lak_data type is the type of “raw” data that we have knowledge of; while 
lak_info is some distinct information that might be shared by different pieces 
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of data. For example, PBFT replicas collect batches of 2f + 1 (pre-)prepare 
messages from different replicas, that share the same view, sequence number, and 
digest. In that case, the (pre-)prepare messages are the raw data that contain the 
common information consisting of a view, a sequence number, and a digest. The 
lak_memory type is the type of objects used to store one’s knowledge, such as a 
state machine state. One has to provide a lak_data2info function to extract the 
information embedded in some piece of data. The lak_know predicate explains 
what it means to know some piece of data. The lak_data2owner function extracts 
the “owner” of some piece of data, typically the node that generated the data. In 
order to authenticate pieces of data, the lak_data2auth function extracts some 
piece of authenticated data from some piece of raw data. For convenience, we 
define the following wrapper around lak_data2owner: 


Definition lak_data2node (d : lak_data) : name := node2name (lak_data2owner d). 


Let us now turn to the two main components of our theory, namely the 
know and learn epistemic modal operators. These operators provide an abstrac- 
tion barrier: they allow us to abstract away from how knowledge is stored and 
computed, in order to focus on the mere fact that we have that knowledge. 


Definition know (sm : node — StateMachine lak_memory) (e : Event) (d : lak_data) := 
exists mem i, loc e = node2name i 
A state_sm_after_event (sm i) e = Some mem 
A lak_know d mem. 


where we simply write (StateMachine S) for a state machine with a state of 
type S, that takes messages as inputs, and outputs lists of directed messages. 
This states that the state machine (sm i) knows the data d at event e if its state 
is mem at e and (lak_know d mem) is true. We define learn as follows: 


Definition learn (e : Event) (d : lak_data) := 
exists i, loc e = node2name i 
A In (lak_data2auth d) (bind_op-_list get_contained_auth_data (trigger e)) 
A verify_auth_data (loc e) (lak_data2auth d) (keys e) = true. 


This states that a node learns d at some event e, if e was triggered by a message 
that contains the data d. Moreover, because we deal with Byzantine faults, we 
require that to learn some data one has to be able to verify its authenticity. 
Next, we define a few predicates that are useful to track down knowledge. 
The first one is a local predicate that says that for a state machine to know 
about a piece of information it has to either have learned it or generated it. 


Definition learn_or_know (sm : node — StateMachine lak_memory) := 
forall (d : lak_data) (e : Event), 
know sm e d — (exists e’, e’ C e A learn e’ d) V lak_data2node d = loc e. 


The next one is a distributed predicate that states that if one learns some piece 
of information that is owned by a correct node, then that correct node must 
have known that piece of information: 
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Definition learn_if_know (sm : node — StateMachine lak_memory) := 
forall (d : lak_data) (e : Event), 
(learn e d A has_correct_trace_before e (lak_data2node d)) 
— exists e’, e’ < e A loc e’ = lak_data2node d A know sm e’ d. 


Using these two predicates, we have proved this general lemma about knowl- 
edge propagating through nodes: 


Lemma know_propagates : 
forall (e : Event) (sm : node — StateMachine lak_memory) (d : lak_data), 
(learn_or_know sm A learn_if_know sm) 
— (know sm e d A has-correct_trace_before e (lak_data2node d)) 
— exists e’, e’ < e A loc e’ = lak_data2node d A know sm e’ d. 


This lemma says that, assuming learn_or_know and learn_if_know, if one knows 
at some event e some data d that is owned by a correct node, then that correct 
node must have known that data at a prior event e’. We use this lemma to track 
down information through correct nodes. 

As mentioned in Sect.4.3, when reasoning about distributed systems, one 
often needs to reason about certificates, i.e., about collections of messages 
from different sources. In order to capture this, we introduce the following 
know certificate predicate, which says that the state machine sm knows the 
information į at event e if there exists a list l of pieces of data of length at 
least k (the certificate size) that come from different sources, and such that sm 
knows each of these pieces of data, and each piece of data carries the common 
information nfo: 


Definition know-certificate (sm : node — StateMachine lak_memory) 
(e : Event) (k : nat) (nfo : lak_info) (P : list lak_data — Prop) := 
exists (I: list lak_data), 
k < length l A no_repeats (map lak_data2owner 1) A P l 
A forall d, In d l > (know sm e d A nfo = lak_data2info d). 


Using this predicate, we can then combine the quorum and knowledge the- 
ories to prove the following lemma, which captures the fact that if there are 
two quorums for information nfo! (known at e1) and nfo? (known at e2), and 
the intersection of the two quorums is guaranteed to contain a correct node, 
then there must be a correct node (at which e1’ and e2’ happen) that owns 
and knows both nfol and nfo2—this lemma follows from know_propagates and 
overlapping_quorums: 


Lemma know-_in-intersection : 
forall (sm : node — StateMachine lak_memory) (e1 e2 : Event) (nfol nfo2 : lak_info) 
(k f : nat) (P : list lak_data — Prop) (E : list Event), 
(learn_or_know sm A learn_if_know sm) 
— (k < num_nodes A num_nodes + f < 2 * k) 
— (exists_at_most_f_faulty E f A In e1 E A In e2 E) 
— (know-certificate sm e1 k nfol P A know-certificate sm e2 k nfo2 P) 
— exists el’ e?’ di d2, loc e1’ = loc e2’ A el’ < el A e?’ < e2 
A loc e1’ = lak_data2node di A loc e2’ = lak_data2node d2 
A know sm el’ d1 A know sm e?’ d2 
A il = lak_data2info d1 A i2 = lak_data2info d2. 
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Similarly, we proved the following lemma, which captures the fact that there 
is always a correct replica that can vouch for the information of a weak certificate: 


Lemma know_weak-certificate : 
forall (e : Event) (k f : nat) (nfo : lak_info) (P : list lak_data — Prop) (E : list Event), 
(f < k A exists_at_most_f_faulty E f A In e E A know-certificate e k nfo P) 
— exists d, has_correct_trace_before e (node2node d) A know e d A nfo = lak-data2info d. 


PBFT. One of the key lemmas to prove PBFT’s safety says that if two cor- 
rect replicas have prepared some requests with the same sequence and view 
numbers, then the requests must be the same [14, Inv.A.1.4]. As mentioned in 
Sect.2.1, a replica has prepared a request if it received pre-prepare and pre- 
pare messages from a quorum of replicas. To prove this lemma, we instantiated 
LearnAndKnow as follows: lak_data can either be a pre-prepare or a prepare mes- 
sage; lak_info is the type of triples view/sequence number /digest; lak_memory 
is the type of states maintained by replicas; lak_data2info extracts the view, 
sequence number and digest contained in pre-prepare and prepare messages; 
lak_know states that the pre-prepare or prepare message is stored in the state; 
lak_data2owner extracts the sender of the message; and lak_data2auth is similar 
to the PBF Tget_contained_auth_data function presented in Sect.3.6. The two 
predicates learn_or_know and learn_if_know, which we proved using the tactic 
discussed in Sect. 4.1, are true about this instance of LearnAndKnow. Inv.A.1.4 is 
then a straightforward consequence of know-in-intersection applied to the two 
quorums. 


5 Verification of PBFT 


Agreement. Velisarios is designed as a general, reusable, and extensible frame- 
work that can be instantiated to prove the correctness of any BFT protocol. We 
demonstrated its usability by proving that our PBFT implementation satisfies 
the standard agreement property, which is the crux of linearizability (we leave 
linearizability for future work—see Sect.2.2 for a high-level definition). Agree- 
ment states that, regardless of the view, any two replies sent by correct replicas 
il and i2 at events e/ and e2 for the same timestamp ts to the same client c 
contain the same replies. We proved that this is true in any event ordering that 
satisfies the assumptions from Sect. 3.6:1? 


Lemma agreement : 
forall (eo : EventOrdering) (e1 e2 : Event) (v1 v2 : View) (ts : Timestamp) 
(c : Client) (i1 i2 : Rep) (r1 r2 : Request) (a1 a2 : list Token), 
authenticated_messages_were-_sent_or_byz_sys eo PBF Tsys A correct_keys eo 
— (exists_at_most_f_faulty [e1,e2] f A loc e1 = PBFTreplica i1 A loc e2 = PBFTreplica i2) 
— In (send-reply v1 ts c il r1 a1) (output-system_on_event PBFTsys e1) 
— In (send-reply v2 ts c i2 r2 a2) (output_system_on_event PBFTsys e2) 
=> ri = r2. 


12 See agreement in https://github.com/vrahli/Velisarios/blob/master/PBFT/ 
PBFTagreement.v. 
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where Timestamps are nats; authenticated_messages_were_sent_or_byz_sys is 
defined on systems using authenticated_messages_were_sent_or_byz; the func- 
tion output_system_on_event is similar to state_sm_after_event (see Sect. 3.5) 
but returns the outputs of a given state machine at a given event instead of 
returning its state; and send_reply builds a reply message. To prove this lemma, 
we proved most of the invariants stated by Castro in [14, Appendix A]. In addi- 
tion, we proved that if the last executed sequence number of two correct replicas 
is the same, then these two replicas have, among other things, the same service 
state. 13 

As mentioned above, because our model is based on LoE, we only ever prove 
such properties by induction on causal time. Similarly, Castro proved most of his 
invariants by induction on the length of the executions. However, he used other 
induction principles to prove some lemmas, such as Inv.A.1.9, which he proved by 
induction on views [14, p. 151]. This invariant says that prepared requests have 
to be consistent with the requests sent in pre-prepare messages by the primary. 
A straightforward induction on causal time was more natural in our setting. 

Castro used a simulation method to prove PBFT’s safety: he first proved 
the safety of a version without garbage collection and then proved that the ver- 
sion with garbage collection implements the one without. This requires defining 
two versions of the protocol. Instead, we directly prove the safety of the one 
with garbage collection. This involved proving further invariants about stored, 
received and sent messages, essentially that they are always within the water 
marks. 


Proof Effort. In terms of proof effort, developing Velisarios and verifying PBFT’s 
agreement property took us around 1 person year. Our generic Velisarios frame- 
work consists of around 4000 lines of specifications and around 4000 lines of 
proofs. Our verified implementation of PBFT consists of around 20000 lines of 
specifications and around 22000 lines of proofs. 


6 Extraction and Evaluation 


Extraction. To evaluate our PBFT implementation (i.e., PBFTsys defined 
in Sect.3.5—a collection of state machines), we generate OCaml code using 
Coq’s extraction mechanism. Most parameters, such as the number of toler- 
ated faults, are instantiated before extraction. Note that not all parameters 
need to be instantiated. For example, as mentioned in Sect. 3.1, neither do we 
instantiate event orderings, nor do we instantiate our assumptions (such as 
exists_at_most_f_faulty), because they are not used in the code but are only 
used to prove that properties are true about all possible runs. Also, keys, signa- 
tures, and digests are only instantiated by stubs in Coq. We replace those stubs 
when extracting OCaml code by implementations provided by the nocrypto [66] 
library, which is the cryptographic library we use to hash, sign, and verify mes- 
sages (we use RSA). 


13 See same_states_if_same_next_to_execute in https://github.com/vrahli/Velisarios/ 
blob/master/PBFT/PBFTsame-states.v. 
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Evaluation. To run the extracted code in a real distributed environment, we 
implemented a small trusted runtime environment in OCaml that uses the Async 
library [5] to handle sender/receiver threads. We show among other things here 
that the average latency of our implementation is acceptable compared to the 
state of the art BFT-SMaRt [8] library. Note that because we do not offer a 
new protocol, but essentially a re-implementation of PBFT, we expect that on 
average the scale will be similar in other execution scenarios such as the ones 
studied by Castro in [14]. We ran our experiments using desktops with 16 GB 
of memory, and 8 i7-6700 cores running at 3.40GHz. We report some of our 
experiments where we used a single client, and a simple state machine where the 
state is a number, and an operation is either adding or subtracting some value. 

We ran a local simulation to measure the performance of our PBFT imple- 
mentation without network and signatures: when 1 client sends 1 million 
requests, it takes on average 27.6pus for the client to receive f +1 (f = 1) 
replies. 
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Fig. 3. (1) Single machine (top/left); (2) several machines (top/right); (3) single 
machine using MACs (bottom/left); (4) view change response time (bottom/right) 


Top/left of Fig.3 shows the experiment where we varied f from 1 to 3, 
and replicas sent messages, signed using RSA, through sockets, but on a sin- 
gle machine. As mentioned above, we implemented the digital signature-based 
version of PBFT, while BFT-SMaRt uses a more efficient MAC-based authen- 
tication scheme, which in part explains why BFT-SMaRt is around one order 
of magnitude faster than our implementation. As in [14, Table 8.9], we expect a 
similar improvement when using the more involved, and as of yet not formally 
verified, MAC-based version of PBFT (bottom/left of Fig. 3 shows the average 
response time when replacing digital signatures by MACs, without adapting 
the rest of the protocol). Top/right of Fig.3 presents results when running our 
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version of PBFT and BFT-SMaRt on several machines, for f = 1. Finally, bot- 
tom/right of Fig. 3 shows the response time of our view-change protocol. In this 
experiment, we killed the primary after 16 s of execution, and it took around 7 s 
for the system to recover. 


Trusted Computing Base. The TCB of our system includes: (1) the fact that our 
LoE model faithfully reflects the behavior of distributed systems (see Sect. 3.4); 
(2) the validity of our assumptions: authenticated_messages_were_sent _or_byz; 
exists_at_most_f_faulty; correct_keys; and  create_hash_collision_resistant 
(Sect. 3.6); (3) Coq’s logic and implementation; (4) OCaml and the nocrypto 
and Async libraries we use in our runtime environment, and the runtime envi- 
ronment itself (Sect. 6); (5) the hardware and software on which our framework 
is running. 


7 Related Work 


Our framework is not the first one for implementing and reasoning about the 
correctness of distributed systems (see Fig. 4). However, to the best of our knowl- 
edge, (1) it is the first theorem prover based tool for verifying the correctness of 
asynchronous Byzantine fault-tolerant protocols and their implementations; and 
(2) we provide the first mechanical proof of the safety of a PBFT implementa- 
tion. Velisarios has evolved from our earlier EventML framework [71], primarily 
to reason about Byzantine faults and distributed epistemic knowledge. 


Running code|Byz. (synch.)|Byz. (asynch.) 
IronFleet /Event ML/Verdi/Disel/PSync v x x 
HO-model/PVS x v x 
Event-B v/Xx v x 
IOA/TLA*t /ByMG x Z7 Z 
Velisarios v y v 


Fig. 4. Comparison with related work 


7.1 Logics and Models 


IOA [33-35,78] is the model used by Castro [14] to prove PBFT’s safety. It is 
a programming/specification language for describing asynchronous distributed 
systems as I/O automata [58] (labeled state transition systems) and stating 
their properties. While IOA is state-based, the logic we use in this paper is 
event-based. IOA can interact with a large range of tools such as type checkers, 
simulators, model checkers, theorem provers, and there is support for synthesis 
of Java code [78]. In contrast, our methodology allows us to both implement and 
verify protocols within the same tool, namely Coq. 
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TLA® [24,51] is a language for specifying and reasoning about systems. It com- 
bines: (1) TLA [52], which is a temporal logic for describing systems [51], and 
(2) set theory, to specify data structures. TLAPS [24] uses a collection of theo- 
rem provers, proof assistants, SMT solvers, and decision procedures to mechan- 
ically check TLA proofs. Model checker integration helps catch errors before 
verification attempts. TLA* has been used in a large number of projects (e.g., 
(12, 18,44, 56,63,64]) including proofs of safety and liveness of Multi-Paxos [18], 
and safety of a variant of an abstract model of PBFT [13]. To the best of our 
knowledge, TLAt does not perform program synthesis. 


The Heard-Of (HO) Model [23] requires processes to execute in lock-step 
through rounds into which the distributed algorithms are divided. Asynchronous 
fault-tolerant systems are treated as synchronous systems with adversarial envi- 
ronments that cause messages to be dropped. The HO-model was implemented 
in Isabelle/HOL [22] and used, for example, to verify the EIGByz [7] Byzantine 
agreement algorithm for synchronous systems with reliable links. This formaliza- 
tion uses the notion of global state of the system [19], while our approach relies on 
Lamport’s happened before relation [53], which does not require reasoning about 
a distributed system as a single entity (a global state). Model checking and the 
HO-model were also used in [21,80,81] for verifying the crash fault-tolerant con- 
sensus algorithms presented in [23]. To the best of our knowledge, there is no 
tool that allows generating code from algorithms specified using the HO-model. 


Event-B [1] is a set-theory-based language for modeling reactive systems and 
for refining high-level abstract specifications into low-level concrete ones. It sup- 
ports code generation [32,61], with some limitations (not all features are cov- 
ered). The Rodin [2] platform for Event-B provides support for refinement, and 
automated and interactive theorem proving. Both have been used in a number of 
projects, such as: to prove the safety and liveness of self-* systems [4]; to prove 
the agreement and validity properties of the synchronous crash-tolerant Floodset 
consensus algorithm [57]; and to prove the agreement and validity of synchronous 
Byzantine agreement algorithms [50]. In [50], the authors assume that messages 
cannot be forged (using PBFT, at most f nodes can forge messages), and do not 
verify implementations of these algorithms. 


7.2 Tools 


Verdi [85,86] is a framework to develop and reason about distributed systems 
using Coq. As in our framework, Verdi leaves no gaps between verified and 
running code. Instead, OCaml code is extracted directly from the verified Coq 
implementation. Verdi provides a compositional way of specifying distributed 
systems. This is done by applying verified system transformers. For example, 
Raft [67|—an alternative to Paxos—transforms a distributed system into a crash- 
tolerant one. One difference between our respective methods is that they verify 
a system by reasoning about the evolution of its global state, while we use 
Lamport’s happened before relation. Moreover, they do not deal with the full 
spectrum of arbitrary faults (e.g., malicious faults). 
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Disel [75,84] is a verification framework that implements a separation-style pro- 
gram logic, and that enables compositional verification of distributed systems. 


lronFleet [40,41] is a framework for building and reasoning about distributed 
systems using Dafny [55] and the Z3 theorem prover [62]. Because systems are 
both implemented in and verified using Dafny, IronFleet also prevents gaps 
between running and verified code. It uses a combination of TLA-style state- 
machine refinements [51] to reason about the distributed aspects of protocols, 
and Floyd-Hoare-style imperative verification techniques to reason about local 
behavior. The authors have implemented, among other things, the Paxos-based 
state machine replication library IronRSL, and verified its safety and liveness. 


PSync [28] is a domain specific language embedded in Scala, that enables exe- 
cuting and verifying fault-tolerant distributed algorithms in synchronous and 
partially asynchronous networks. PSync is based on the HO-model, and has been 
used to implement several crash fault-tolerant algorithms. Similar to the Verdi 
framework, PSync makes use of a notion of global state and supports reason- 
ing based on the multi-sorted first-order Consensus verification logic (CL) [27]. 
To prove safety, users have to provide invariants, which CL checks for validity. 
Unlike Verdi, IronFleet and PSync, we focus on Byzantine faults. 


ByMC is a model checker for verifying safety and liveness of fault-tolerant dis- 
tributed algorithms [47—49]. It applies an automated method for model checking 
parametrized threshold-guarded distributed algorithms (e.g., processes waiting 
for messages from a majority of distinct senders). ByMC is based on a short 
counter-example property, which says that if a distributed algorithm violates a 
temporal specification then there is a counterexample whose length is bounded 
and independent of the parameters (e.g. the number of tolerated faults). 


Ivy [69] allows debugging infinite-state systems using bounded verification, and 
formally verifying their safety by gradually building universally quantified induc- 
tive invariants. To the best of our knowledge, Ivy does not support faults. 


Actor Services [77] allows verifying the distributed and functional properties 
of programs communicating via asynchronous message passing at the level of 
the source code (they use a simple Java-like language). It supports modular 
reasoning and proving liveness. To the best of our knowledge, it does not deal 
with faults. 


PVS has been extensively used for verification of synchronous systems that tol- 
erate malicious faults such as in [74], to the extent that its design was influenced 
by these verification efforts [68]. 


8 Conclusions and Future Work 


We introduced Velisarios, a framework to implement and reason about BFT- 
SMR protocols using the Coq theorem prover, and described a methodology 
based on learn/know epistemic modal operators. We used this framework to 
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prove the safety of a complex system, namely Castro’s PBFT protocol. In the 
future, we plan to also tackle liveness/timeliness. Indeed, proving the safety of 
a distributed system is far from being enough: a protocol that does not run 
(which is not live) is useless. Following the same line of reasoning, we want to 
tackle timeliness because, for real world systems, it is not enough to prove that 
a system will eventually reply. One often desires that the system replies in a 
timely fashion. 
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Abstract. Numeric static analysis for Java has a broad range of poten- 
tially useful applications, including array bounds checking and resource 
usage estimation. However, designing a scalable numeric static analy- 
sis for real-world Java programs presents a multitude of design choices, 
each of which may interact with others. For example, an analysis could 
handle method calls via either a top-down or bottom-up interprocedu- 
ral analysis. Moreover, this choice could interact with how we choose 
to represent aliasing in the heap and/or whether we use a relational 
numeric domain, e.g., convex polyhedra. In this paper, we present a 
family of abstract interpretation-based numeric static analyses for Java 
and systematically evaluate the impact of 162 analysis configurations 
on the DaCapo benchmark suite. Our experiment considered the pre- 
cision and performance of the analyses for discharging array bounds 
checks. We found that top-down analysis is generally a better choice 
than bottom-up analysis, and that using access paths to describe heap 
objects is better than using summary objects corresponding to points- 
to analysis locations. Moreover, these two choices are the most signifi- 
cant, while choices about the numeric domain, representation of abstract 
objects, and context-sensitivity make much less difference to the preci- 
sion/performance tradeoff. 


1 Introduction 


Static analysis of numeric program properties has a broad range of useful appli- 
cations. Such analyses can potentially detect array bounds errors [50], analyze 
a program’s resource usage [28,30], detect side channels [8,11], and discover 
vectors for denial of service attacks [10, 26]. 

One of the major approaches to numeric static analysis is abstract inter- 
pretation [18], in which program statements are evaluated over an abstract 
domain until a fixed point is reached. Indeed, the first paper on abstract 
interpretation [18] used numeric intervals as one example abstract domain, 
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and many subsequent researchers have explored abstract interpretation-based 
numeric static analysis [13,22-25,31]. 

Despite this long history, applying abstract interpretation to real-world Java 
programs remains a challenge. Such programs are large, have many interacting 
methods, and make heavy use of heap-allocated objects. In considering how to 
build an analysis that aims to be sound but also precise, prior work has explored 
some of these challenges, but not all of them together. For example, several works 
have considered the impact of the choice of numeric domain (e.g., intervals vs. 
convex polyhedra) in trading off precision for performance but not considered 
other tradeoffs [24,38]. Other works have considered how to integrate a numeric 
domain with analysis of the heap, but unsoundly model method calls [25] and/or 
focus on very precise properties that do not scale beyond small programs [23, 24]. 
Some scalability can be recovered by using programmer-specified pre- and post- 
conditions [22]. In all of these cases, there is a lack of consideration of the broader 
design space in which many implementation choices interact. (Sect. 7 considers 
prior work in detail.) 

In this paper, we describe and then systematically explore a large design 
space of fully automated, abstract interpretation-based numeric static analyses 
for Java. Each analysis is identified by a choice of five configurable options—the 
numeric domain, the heap abstraction, the object representation, the interpro- 
cedural analysis order, and the level of context sensitivity. In total, we study 162 
analysis configurations to asses both how individual configuration options per- 
form overall and to study interactions between different options. To our knowl- 
edge, our basic analysis is one of the few fully automated numeric static analyses 
for Java, and we do not know of any prior work that has studied such a large 
static analysis design space. 

We selected analysis configuration options that are well-known in the static 
analysis literature and that are key choices in designing a Java static analysis. For 
the numeric domain, we considered both intervals [17] and convex polyhedra [19], 
as these are popular and bookend the precision/performance spectrum. (See 
Sect. 2.) 

Modeling the flow of data through the heap requires handling pointers and 
aliasing. We consider three different choices of heap abstraction: using summary 
objects [25,27], which are weakly updated, to summarize multiple heap locations; 
access paths [21,52], which are strongly updated; and a combination of the two. 

To implement these abstractions, we use an ahead-of-time, global points- 
to analysis [44], which maps static/local variables and heap-allocated fields to 
abstract objects. We explore three variants of abstract object representation: 
the standard allocation-site abstraction (the most precise) in which each syn- 
tactic new in the program represents an abstract object; class-based abstraction 
(the least precise) in which each class represents all instances of that class; 
and a smushed string abstraction (intermediate precision) which is the same 
as allocation-site abstraction except strings are modeled using a class-based 
abstraction [9]. (See Sect. 3.) 
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We compare three choices in the interprocedural analysis order we use to 
model method calls: top-down analysis, which starts with main and analyzes 
callees as they are encountered; and bottom-up analysis, which starts at the leaves 
of the call tree and instantiates method summaries at call sites; and a hybrid 
analysis that is bottom-up for library methods and top-down for application 
code. In general, top-down analysis explores fewer methods, but it may analyze 
callees multiple times. Bottom-up analysis explores each method once but needs 
to create summaries, which can be expensive. 

Finally, we compare three kinds of context-sensitivity in the points-to analy- 
sis: context-insensitive analysis, 1-CFA analysis [46] in which one level of calling 
context is used to discriminate pointers, and type-sensitive analysis [49] in which 
the type of the receiver is the context. (See Sect. 4.) 

We implemented our analysis using WALA [2] for its intermediate represen- 
tation and points-to analyses and either APRON [33,41] or ELINA [47,48] for 
the interval or polyhedral, respectively, numeric domain. We then applied all 162 
analysis configurations to the DaCapo benchmark suite [6], using the numeric 
analysis to try to prove array accesses are within bounds. We measured the anal- 
yses’ performance and the number of array bounds checks they discharged. We 
analyzed our results by using a multiple linear regression over analysis features 
and outcomes, and by performing data visualizations. 

We studied three research questions. First, we examined how analysis config- 
uration affects performance. We found that using summary objects causes signif- 
icant slowdowns, e.g., the vast majority of the analysis runs that timed out used 
summary objects. We also found that polyhedral analysis incurs a significant 
slowdown, but only half as much as summary objects. Surprisingly, bottom-up 
analysis provided little performance advantage generally, though it did provide 
some benefit for particular object representations. Finally, context-insensitive 
analysis is faster than context-sensitive analysis, as might be expected, but the 
difference is not great when combined with more approximate (class-based and 
smushed string) abstract object representations. 

Second, we examined how analysis configuration affects precision. We found 
that using access paths is critical to precision. We also found that the bottom- 
up analysis has worse precision than top-down analysis, especially when using 
summary objects, and that using a more precise abstract object representation 
improves precision. But other traditional ways of improving precision do so only 
slightly (the polyhedral domain) or not significantly (context-sensitivity). 

Finally, we looked at the precision/performance tradeoff for all programs. 
We found that using access paths is always a good idea, both for precision 
and performance, and top-down analysis works better than bottom-up. While 
summary objects, originally proposed by Fu [25], do help precision for some 
programs, the benefits are often marginal when considered as a percentage of 
all checks, so they tend not to outweigh their large performance disadvantage. 
Lastly, we found that the precision gains for more precise object representations 
and polyhedra are modest, and performance costs can be magnified by other 
analysis features. 
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Table 1. Analysis configuration options, and their possible settings. 


Config. Option Setting | Description 
Numeric domain (ND) INT Intervals 

POL Polyhedra 
Heap abstraction (HA) | SO Only summary objects 

AP Only access paths 

AP+SO | Both access paths and summary objects 
Abstract object ALLO Alloc-site abstraction 
representation (OR) CLAS Class-based abstraction 

SMUS || Alloc-site except Strings 
Inter-procedural analysis | TD Top-down 
order (AO) BU Bottom-up 

TD+BU | Hybrid top-down and bottom-up 
Context sensitivity (CS) | Cl Context-insensitive 

1CFA 1-CFA 


1TYP Type-sensitive 


In summary, our empirical study provides a large, comprehensive evaluation 
of the effects of important numeric static analysis design choices on performance, 
precision, and their tradeoff; it is the first of its kind. Our code and data is 
available at https://github.com/plum-umd/JANA. 


2 Numeric Static Analysis 


A numeric static analysis is one that tracks numeric properties of memory loca- 
tions, e.g., that x < 5 or y > z. A natural starting point for a numeric static 
analysis for Java programs is numeric abstract interpretation over program vari- 
ables within a single procedure/method [18]. 

A standard abstract interpretation expresses numeric properties using a 
numeric abstract domain, of which the most common are intervals (also known as 
boxes) and convex polyhedra. Intervals [17] define abstract states using inequal- 
ities of the form p relop n where p is a variable, n is a constant integer, and 
relop is a relational operator such as <. A variable such as p is sometimes called 
a dimension, as it describes one axis of a numeric space. Convex polyhedra [19] 
define abstract states using linear relationships between variables and constants, 
e.g., of the form 3p; — p2 < 5. Intervals are less precise but more efficient than 
polyhedra. Operation on intervals have time complexity linear in the number of 
dimensions whereas the time complexity for polyhedra operations is exponential 
in the number of dimensions.! 


1 Further, the time complexity of join is O(d- et) where c is the number of con- 
straints, and d is the number of dimensions [47]. 
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Numeric abstract interpretation, including our own analyses, are usually flow- 
sensitive, i.e., each program point has an associated abstract state characteriz- 
ing properties that hold at that point. Variable assignments are strong updates, 
meaning information about the variable is replaced by information from the 
right-hand side of the assignment. At merge points (e.g., after the completion of 
a conditional), the abstract states of the possible prior states are joined to yield 
properties that hold regardless of the branch taken. Loop bodies are reanalyzed 
until their constituent statements’ abstract states reach a fixed point. Reaching 
a fixed point is accelerated by applying the numeric domain’s standard widening 
operator [4] in place of join after a fixed number of iterations. 

Scaling a basic numeric abstract interpreter to full Java requires making 
many design choices. Table 1 summarizes the key choices we study in this paper. 
Each configuration option has a range of settings that potentially offer different 
precision/performance tradeoffs. Different options may interact with each other 
to affect the tradeoff. In total, we study five options with two or three settings 
each. We have already discussed the first option, the numeric domain (ND), for 
which we consider intervals (INT) and polyhedra (POL). The next two options 
consider the heap, and are discussed in the next section, and the last two options 
consider method calls, and are discussed in Sect. 4. 

For space reasons, our paper presentation focuses on the high-level design 
and tradeoffs. Detailed algorithms are given formally in the technical report [51] 
for the heap and interprocedural analysis. 


3 The Heap 


The numeric analysis described so far is sufficient only for analyzing code with 
local, numeric variables. To analyze numeric properties of heap-manipulating 
programs, we must also consider heap locations x.f, where x is a reference to a 
heap-allocated object, and f is a numeric field.? To do so requires developing a 
heap abstraction (HA) that accounts for aliasing. In particular, when variables x 
and y may point to the same heap object, an assignment to x.f could affect y.f. 
Moreover, the referent of a pointer may be uncertain, e.g., the true branch of a 
conditional could assign location 0; to x, while the false branch could assign o2 
to x. This uncertainty must be reflected in subsequent reads of x.f. 

We use a points-to analysis to reason about aliasing. A points-to analysis 
computes a mapping Pt from variables x and access paths x.f to (one or more) 
abstract objects [44]. If Pt maps two variables/paths pı and p2 to a common 
abstract object o then pı and pə may alias. We also use points-to analysis to 
determine the call graph, i.e., to determine what method may be called by an 
expression x.m/(...) (discussed in Sect. 4). 


2 In our implementation, statements such as z = «.f.g are decomposed so that paths 
are at most length one, e.g., w = 2.f;z=w.g. 
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3.1 Summary Objects (SO) 


The first heap abstraction we study is based on Fu [25]: use a summary object 
(SO) to abstract information about multiple heap locations as a single abstract 
state “variable” [27]. As an example, suppose that Pt(x) = {0} and we encounter 
the assignment x.f := 5. Then in this approach, we add a variable o_f to the 
abstract state, modeling the field f of object o, and we add constraint o_f = n. 
Subsequent assignments to such summary objects must be weak updates, to 
respect the may alias semantics of the points-to analysis. For example, suppose 
y.f may alias x.f, i.e., o € Pt(x) Pt(y). Then after a later assignment y.f := 7 
the analysis would weakly update o_f with 7, producing constraints 5 < o_f <7 
in the abstract state. These constraints conservatively model that either o_f = 5 
or o_f = 7, since the assignment to y.f may or may not affect xf. 

In general, weak updates are more expensive than strong updates, and read- 
ing a summary object is more expensive than reading a variable. A strong update 
to x is implemented by forgetting x in the abstract state, and then re-adding it 
to be equal to the assigned-to value. Note that x cannot appear in the assigned-to 
value because programs are converted into static single assignment form (Sect. 5). 
A weak update—which is not directly supported in the numeric domain libraries 
we use—is implemented by copying the abstract state, strongly updating x in 
the copy, and then joining the two abstract states. Reading from a summary 
object requires “expanding” the abstract state with a copy o’_f of the summary 
object and its constraints, creating a constraint on o’_f, and then forgetting o’_f. 
Doing this ensures that operations on a variable into which a summary object is 
read do not affect prior reads. A normal read just references the read variable. 

Fu [25] argues that this basic approach is better than ignoring heap locations 
entirely by measuring how often field reads are not unconstrained, as would be 
the case for a heap-unaware analysis. However, it is unclear whether the approach 
is sufficiently precise for applications such as array-bounds check elimination. 
Using the polyhedra numeric domain should help. For example, a Buffer class 
might store an array in one field and a conservative bound on an array’s length 
in another. The polyhedral domain will permit relating the latter to the former 
while the interval domain will not. But the slowdown due to the many added 
summary objects may be prohibitive. 


3.2 Access Paths (AP) 


An alternative heap abstraction we study is to treat access paths (AP) as if 
they are normal variables, while still accounting for possible aliasing [21,52]. In 
particular, a path z.f is modeled as a variable x_f, and an assignment x.f :=n 
strongly updates x_f to be n. At the same time, if there exists another path y.f 
and x and y may alias, then we must weakly update y_f as possibly containing n. 
In general, determining which paths must be weakly updated depends on the 
abstract object representation and context-sensitivity of the points-to analysis. 


3 Doing so has the effect of “connecting” constraints that are transitive via x. For 
example, given y < x < 5, forgetting x would yield constraint y < 5. 
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Two key benefits of AP over SO are that (1) AP supports strong updates to 
paths x. f, which are more precise and less expensive than weak updates, and (2) 
AP may require fewer variables to be tracked, since, in our design, access paths 
are mostly local to a method whereas points-to sets are computed across the 
entire program. On the other hand, SO can do better at summarizing invariants 
about heap locations pointed to by other heap locations, i.e., not necessarily via 
an access path. Especially when performing an interprocedural analysis, such 
information can add useful precision. 


Combined (AP+SO). A natural third choice is to combine AP and SO. Doing 
so sums both the costs and benefits of the two approaches. An assignment 
x.f := n strongly updates «_f and weakly updates o_f for each o in Pt(z) 
and each y_f where Pt(x) N Pt(y) 4 Ø. Reading from x.f when it has not been 
previously assigned to is just a normal read, after first strongly updating x_f to 
be the join of the summary read of o_f for each o € Pt(a). 


3.3 Abstract Object Representation (OR) 


Another key precision/performance tradeoff is the abstract object representation 
(OR) used by the points-to analysis. In particular, when Pt(x) = {01,...,0n}, 
where do the names 0}, ..., On come from? The answer impacts the naming of sum- 
mary objects, the granularity of alias checks for assignments to access paths, and 
the precision of the call-graph, which requires aliasing information to determine 
which methods are targeted by a dynamic dispatch x.m(...). 

As shown in the third row of Table 1, we explore three representations for 
abstract objects. The first choice names abstract objects according to their allo- 
cation site (ALLO)—all objects allocated at the same program point have the 
same name. This is precise but potentially expensive, since there are many possi- 
ble allocation sites, and each path x.f could be mapped to many abstract objects. 
We also consider representing abstract objects using class names (CLAS), where 
all objects of the same class share the same abstract name, and a hybrid smushed 
string (SMUS) approach, where every String object has the same abstract name 
but objects of other types have allocation-site names [9]. The class name app- 
roach is the least precise but potentially more efficient since there are fewer 
names to consider. The smushed string analysis is somewhere in between. The 
question is whether the reduction in names helps performance enough, without 
overly compromising precision. 


4 Method Calls 


So far we have considered the first three options of Tablel, which handle 
integer variables and the heap. This section considers the last two options— 
interprocedural analysis order (AO) and context sensitivity (CS). 
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4.1 Interprocedural Analysis Order (AO) 


We implement three styles of interprocedural analysis: top-down (TD), bottom- 
up (BU), and their combination (TD+BU). The TD analysis starts at the pro- 
gram entry point and, as it encounters method calls, analyzes the body of the 
callee (memoizing duplicate calls). The BU analysis starts at the leaves of the 
call graph and analyzes each method in isolation, producing a summary of its 
behavior [29,53]. (We discuss call graph construction in the next subsection.) 
This summary is then instantiated at each method call. The hybrid analysis 
works top-down for application code but bottom-up for any code from the Java 
standard library. 


Top-Down (TD). Assuming the analyzer knows the method being called, a 
simple approach to top-down analysis would be to transfer the caller’s state to 
the beginning of callee, analyze the callee in that state, and then transfer the 
state at the end of the callee back to the caller. Unfortunately, this approach 
is prohibitively expensive because the abstract state would accumulate all local 
variables and access paths across all methods along the call-chain. 

We avoid this blowup by analyzing a call to method m while considering only 
relevant local variables and heap abstractions. Ignoring the heap for the moment, 
the basic approach is as follows. First, we make a copy C'm of the caller’s abstract 
state C. In Cm, we set variables for m’s formal numeric arguments to the actual 
arguments and then forget (as defined in Sect.3.1) the caller’s local variables. 
Thus Cm will only contain the portion of C relevant to m. We analyze m’s body, 
starting in Cm, to yield the final state C}. Lastly, we merge C and C/_,, strongly 
update the variable that receives the returned result, and forget the callee’s local 
variables—thus avoiding adding the callee’s locals to the caller’s state. 

Now consider the heap. If we are using summary objects, when we copy C 
to Cm we do not forget those objects that might be used by m (according to 
the points-to analysis). As m is analyzed, the summary objects will be weakly 
updated, ultimately yielding state C/, at m’s return. To merge C/, with C, we 
first forget the summary objects in C not forgotten in Cm and then concatenate 
Ci, with C. The result is that updated summary objects from C}, replace those 
that were in the original C. 

If we are using access paths, then at the call we forget access paths in C 
because assignments in m’s code might invalidate them. But if we have an access 
path x.f in the caller and we pass x to m, then we retain x.f in the callee but 
rename it to use m’s parameter’s name. For example, x.f becomes y.f if m’s 
parameter is y. If y is never assigned to in m, we can map y.f back to x.f (in 
the caller) once m returns.* All other access paths in Cm are forgotten prior to 
concatenating with the caller’s state. 

Note that the above reasoning is only for numeric values. We take no partic- 
ular steps for pointer values as the points-to analysis already tracks those across 
all methods. 


* Assignments to y.f in the callee are fine; only assignments to y are problematic. 
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Bottom Up (BU). In the BU analysis, we analyze a method m’s body to 
produce a method summary and then instantiate the summary at calls to m. 
Ignoring the heap, producing a method summary for m is straightforward: start 
analyzing m in a state Cm in which its (numeric) parameters are unconstrained 
variables. When m returns, forget all variables in the final state except the 
parameters and return value, yielding a state C’,, that is the method summary. 
Then, when m is called, we concatenate C/,, with the current abstract state; add 
constraints between the parameters and their actual arguments; strongly update 
the variable receiving the result with the summary’s returned value; and then 
forget those variables. 

When using the polyhedral numeric domain, Cj, can express relationships 
between input and output parameters, e.g., ret < z or ret = x+y. For the 
interval domain, which is non-relational, summaries are more limited, e.g., they 
can express ret < 100 but not ret < x. As such, we expect bottom-up analysis 
to be far more useful with the polyhedral domain than the interval domain. 


Summary Objects. Now consider the heap. Recall that when using summary 
objects in the TD analysis, reading a path x.f into z “expands” each sum- 
mary object o_f when o € Pt(x) and strongly updates z with the join of these 
expanded objects, before forgetting them. This expansion makes a copy of each 
summary object’s constraints so that later use of z does not incorrectly impact 
the summary. However, when analyzing a method bottom-up, we may not yet 
know all of a summary object’s constraints. For example, if x is passed into the 
current method, we will not (yet) know if o_f is assigned to a particular numeric 
range in the caller. 

We solve this problem by allocating a fresh, unconstrained placeholder object 
at each read of x.f and include it in the initialization of the assigned-to variable z. 
The placeholder is also retained in m’s method summary. Then at a call to m, 
we instantiate each placeholder with the constraints in the caller involving the 
placeholder’s summary location. We also create a fresh placeholder in the caller 
and weakly update it to the placeholder in the callee; doing so allows for further 
constraints to be added from calls further up the call chain. 


Access Paths. If we are using access paths, we treat them just as in TD—each z. f 
is allocated a special variable that is strongly updated when possible, according 
to the points-to analysis. These are not kept in method summaries. When also 
using summary objects, at the first read to x. f we initialize it from the summary 
objects derived from x’s points-to set, following the above expansion procedure. 
Otherwise x.f will be unconstrained. 


Hybrid (TD+BU). In addition to TD or BU analysis (only), we implemented 
a hybrid strategy that performs TD analysis for the application, but BU analy- 
sis for code from the Java standard library. Library methods are analyzed first, 
bottom-up. Application method calls are analyzed top-down. When an appli- 
cation method calls a library method, it applies the BU method call approach. 
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TD+BU could potentially be better than TD because library methods, which are 
likely called many times, only need to be analyzed once. TD+BU could similarly 
be better than BU because application methods, which are likely not called as 
many times as library methods, can use the lower-overhead TD analysis. 

Now, consider the interaction between the heap abstraction and the analysis 
order. The use of access paths (only) does not greatly affect the normal TD/BU 
tradeoff: TD may yield greater precision by adding constraints from the caller 
when analyzing the callee, while BU’s lower precision comes with the benefit of 
analyzing method bodies less often. Use of summary objects complicates this 
tradeoff. In the TD analysis, the use of summary objects adds a relatively stable 
overhead to all methods, since they are included in every method’s abstract 
state. For the BU analysis, methods further down in the call chain will see fewer 
summary objects used, and method bodies may end up being analyzed less often 
than in the TD case. On the other hand, placeholder objects add more dimensions 
overall (one per read) and more work at call sites (to instantiate them). But, 
instantiating a summary may be cheaper than reanalyzing the method. 


4.2 Context Sensitivity (CS) 


The last design choice we considered was context-sensitivity. A  contezt- 
insensitive (Cl) analysis conflates information from different call sites of the 
same method. For example, two calls to method m in which the first passes 
zı, yı and the second passes x2, y2 will be conflated such that within m we will 
only know that either x, or x2 is the first parameter, and either yı or y2 is 
the second; we will miss the correlation between parameters. A context sensitive 
analysis provides some distinction among different call sites. A 1-CFA analy- 
sis [46] (1CFA) distinguishes based on one level of calling context, i.e., two calls 
originating from different program points will be distinguished, but two calls 
from the same point, but in a method called from two different points will not. 
A type-sensitive analysis [49] (LTYP) uses the type of the receiver as the context. 
Context sensitivity in the points-to analysis affects alias checks, e.g., when 
determining whether an assignment to x.f might affect y.f. It also affects the 
abstract object representation and call graph construction. Due to the latter, 
context sensitivity also affects our interprocedural numeric analysis. In a context- 
sensitive analysis, a single method is essentially treated as a family of methods 
indexed by a calling context. In particular, our analysis keeps track of the current 
context as a frame, and when considering a call to method x.m(), the target 
methods to which m may refer differ depending on the frame. This provides more 
precision than a context-insensitive (i.e., frame-less) approach, but the analysis 
may consider the same method code many times, which adds greater precision 
but also greater expense. This is true both for TD and BU, but is perhaps more 
detrimental to the latter since it reduces potential method summary reuse. On 
the other hand, more precise analysis may reduce unnecessary work by pruning 
infeasible call graph edges. For example, when a call might dynamically dispatch 
to several different methods, the analysis must consider them all, joining their 
abstract states. A more precise analysis may consider fewer target methods. 
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5 Implementation 


We have implemented an analysis for Java with all of the options described 
in the previous two sections. Our implementation is based on the intermedi- 
ate representation in the T. J. Watson Libraries for Analysis (WALA) version 
1.3.10 [2], which converts a Java bytecode program into static single assignment 
(SSA) form [20], which is then analyzed. We use APRON [33,41] trunk revision 
1096 (published on 2016/05/31) implementation of intervals, and ELINA [47,48], 
snapshot as of October 4, 2017, for convex polyhedra. Our current implemen- 
tation supports all non-floating point numeric Java values and comprises 14 K 
lines of Scala code. 
Next we discuss a few additional implementation details. 


Preallocating Dimensions. In both APRON and ELINA, it is very expensive to 
perform join operations that combine abstract states with different variables. 
Thus, rather than add dimensions as they arise during abstract interpretation, 
we instead preallocate all necessary dimensions—including for local variables, 
access paths, and summary objects, when enabled—at the start of a method 
body. This ensures the abstract states have the same dimensions at each join 
point. We found that, even though this approach makes some states larger than 
they need to be, the overall performance savings is still substantial. 


Arrays. Our analysis encodes an array as an object with two fields, contents, 
which represents the contents of the array, and len, representing the array’s 
length. Each read/write from a[i] is modeled as a weak read/write of contents 
(because all array elements are represented with the same field), with an added 
check that i is between 0 and len. We treat Strings as a special kind of array. 


Widening. As is standard in abstract interpretation, our implementation per- 
forms widening to ensure termination when analyzing loops. In a pilot study, we 
compared widening after between one and ten iterations. We found that there 
was little added precision when applying widening after more than three iter- 
ations when trying to prove array indexes in bounds (our target application, 
discussed next). Thus we widen at that point in our implementation. 


Limitations. Our implementation is sound with a few exceptions. In particular, 
it ignores calls to native methods and uses of reflection. It is also unsound in its 
handling of recursive method calls. If the return value of a recursive method is 
numeric, it is regarded as unconstrained. Potential side effects of the recursive 
calls are not modeled. 


6 Evaluation 


In this section, we present an empirical study of our family of analyses, focusing 
on the following research questions: 
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RQ1: Performance. How does the configuration affect analysis running time? 
RQ2: Precision. How does the configuration affect analysis precision? 
RQ3: Tradeoffs. How does the configuration affect precision and performance? 


To answer these questions, we chose an important analysis client, array index 
out-of-bound analysis, and ran it on the DaCapo benchmark suite [6]. We vary 
each of the analysis features listed in Table 1, yielding 162 total configurations. To 
understand the impact of analysis features, we used multiple linear regression and 
logistic regression to model precision and performance (the dependent variables) 
in terms of analysis features and across programs (the independent variables). 
We also studied per-program data directly. 

Overall, we found that using access paths is a significant boon to precision 
but costs little in performance, while using summary objects is the reverse, 
to the point that use of summary objects is a significant source of timeouts. 
Polyhedra add precision compared to intervals, and impose some performance 
cost, though only half as much as summary objects. Interestingly, when both 
summary objects and polyhedra together would result in a timeout, choosing 
the first tends to provide better precision over the second. Finally, bottom-up 
analysis harms precision compared to top-down analysis, especially when only 
summary objects are enabled, but yields little gain in performance. 


6.1 Experimental Setup 


We evaluated our analyses by using them to perform array index out of bounds 
analysis. More specifically, for each benchmark program, we counted how many 
array access instructions (x[i]=y, y=x[i], etc.) an analysis configuration could 
verify were in bounds (i.e., i<x.length), and measured the time taken to per- 
form the analysis. 


Benchmarks. We analyzed all eleven programs from the DaCapo benchmark 
suite [6] version 2006-10-MR2. The first three columns of Table 2 list the pro- 
grams’ names, their size (number of IR instructions), and the number of array 
bounds checks they contain. The rest of the table indicates the fastest and 
most precise analysis configuration for each program; we discuss these results 
in Sect. 6.4. We ran each benchmark three times under each of the 162 analysis 
configurations. The experiments were performed on two 2.4 GHz single processor 
(with four logical cores) Intel Xeon E5-2609 servers, each with 128GB memory 
running Ubuntu 16.04 (LTS). On each server, we ran three analysis configura- 
tions in parallel, binding each process to a designated core. 

Since many analysis configurations are time-intensive, we set a limit of 1 hour 
for running a benchmark under a particular configuration. All performance 
results reported are the median of the three runs. We also use the median preci- 
sion result, though note the analyses are deterministic, so the precision does not 
vary except in the case of timeouts. Thus, we treat an analysis as not timing out 
as long as either two or three of the three runs completed, and otherwise it is a 
timeout. Among the 1782 median results (11 benchmarks, 162 configurations), 
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Table 2. Benchmarks and overall results. 


# Best Performance Best Precision 
Prog. Size Checks | Time (min) | # Checks | Percent | Time (min) | # Checks | Percent 
BU-AP-CI-CLAS-INT TD-AP+S0-1TYP-CLAS-INT 
antlr 55734 |1526 0.6 1176 77.1% |18.5 1306 85.6% 
BU-AP-CI-CLAS-INT TD-AP-1TYP-SMUS-POL 
bloat 150197 | 4621 4.0 2538 54.9% |17.2 2795 60.5% 
BU-AP-CI-CLAS-INT TD-AP-1TYP-SMUS-INT 
chart 167621 | 7965 3.3 5593 70.2% |7.7 5654 71.0% 
BU-AP-CI-ALLO-INT TD-AP+S0-1TYP-SMUS-POL 
eclipse |18938 |1043 0.2 896 85.9% |3.3 977 93.7% 
BU-AP-CI-CLAS-INT TD-AP+S0-1CFA-SMUS-INT 
fop 33243 | 1337 0.4 998 74.6% |2.6 1137 85.0% 
BU-AP-CI-SMUS-INT TD-AP+SO-CI-SMUS-INT 
hsqldb | 19497 | 1020 0.3 911 89.3% | 1.4 975 95.6% 
BU-AP-CI-SMUS-INT TD-AP-1CFA-CLAS-POL 
jython = | 127661 | 4232 1.3 2667 63.0% | 33.6 2919 69.0% 
BU-AP-CI-SMUS-INT TD-AP+SO-1TYP-ALLO-INT 
luindex | 69027 | 2764 1.8 1682 60.9% | 46.8 2015 72.9% 
BU-AP-CI-CLAS-INT TD-AP+SO-1CFA-ALLO-POL 
lusearch | 20242 | 1062 0.2 912 85.9% | 54.2 979 92.2% 
BU-AP-CI-CLAS-INT TD-AP+SO-CI-CLAS-INT 
pmd 116422 | 4402 1.7 3153 71.6% | 49.5 3301 75.0% 
BU-AP-CI-CLAS-INT TD-AP+SO-1CFA-SMUS-POL 
xalan 20315 | 1043 0.2 912 87.4% | 3.8 981 94.1% 


667 of them (37%) timed out. The percentage of the configurations that timed 
out analyzing a program ranged from 0% (xalan) to 90% (chart). 


Statistical Analysis. To answer RQ1 and RQ2, we constructed a model for each 
question using multiple linear regression. Roughly put, we attempt to produce a 
model of performance (RQ1) and precision (RQ2)—the dependent variables—in 
terms of a linear combination of analysis configuration options (i.e., one choice 
from each of the five categories given in Table 1) and the benchmark program 
(i.e., one of the eleven subjects from DaCapo)—the independent variables. We 
include the programs themselves as independent variables, which allows us to 
roughly factor out program-specific sources of performance or precision gain/loss 
(which might include size, complexity, etc.); this is standard in this sort of regres- 
sion [45]. Our models also consider all two-way interactions among analysis 
options. In our scenario, a significant interaction between two option settings 
suggests that the combination of them has a different impact on the analysis 
precision and/or performance compared to their independent impact. 
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To obtain a model that best fits the data, we performed variable selection 
via the Akaike Information Criterion (AIC) [12], a standard measure of model 
quality. AIC drops insignificant independent variables to better estimate the 
impact of analysis options. The R? values for the models are good, with the 
lowest of any model being 0.71. 

After performing the regression, we examine the results to discover potential 
trends. Then we draw plots to examine how those trends manifest in the different 
programs. This lets us study the whole distribution, including outliers and any 
non-linear behavior, in a way that would be difficult if we just looked at the 
regression model. At the same time, if we only looked at plots it would be hard 
to see general trends because there is so much data. 


Threats to Validity. There are several potential threats to the validity of our 
study. First, the benchmark programs may not be representative of programs 
that analysis users are interested in. That said, the programs were drawn from 
a well-studied benchmark suite, so they should provide useful insights. 

Second, the insights drawn from the results of the array index out-of-bound 
analysis may not reflect the trends of other analysis clients. We note that array 
bounds checking is a standard, widely used analysis. 

Third, we examined a design space of 162 analysis configurations, but there 
are other design choices we did not explore. Thus, there may be other indepen- 
dent variables that have important effects. In addition, there may be limitations 
specific to our implementation, e.g., due to precisely how WALA implements 
points-to analysis. Even so, we relied on time-tested implementations as much 
as possible, and arrived at our choices of analysis features by studying the liter- 
ature and conversing with experts. Thus, we believe our study has value even if 
further variables are worth studying. 

Fourth, for our experiments we ran each analysis configuration three times, 
and thus performance variation may not be fully accounted for. While more trials 
would add greater statistical assurance, each trial takes about a week to run on 
our benchmark machines, and we observed no variation in precision across the 
trials. We did observe variations in performance, but they were small and did 
not affect the broader trends. In more detail, we computed the variance of the 
running time among a set of three runs of a configuration as (max-min) /median 
to calculate the variance. The average variance across all configurations is only 
4.2%. The maximum total time difference (max-min) is 32 min, an outlier from 
eclipse. All the other time differences are within 4 min. 


6.2 RQ1: Performance 


Table 3 summarizes our regression model for performance. We measure perfor- 
mance as the time to run both the core analysis and perform array index out- 
of-bounds checking. If a configuration timed out while analyzing a program, we 
set its running time as one hour, the time limit (characterizing a lower bound 
on the configuration’s performance impact). Another option would have been to 
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Table 3. Model of run-time performance in terms of analysis configuration options 
(Table 1), including two-way interactions. Independent variables for individual pro- 
grams not shown. R? of 0.72. 


a) 
O 
= 


Option Setting Est. (min) CI p-value 
TD - - - 

AO BU -1.98 [-6.3, 1.76] 0.336 
TD+BU 1.97]  [1.78, 6.87|| 0.364 

AP+SO - z E 

HA [AP f osre [-42.36, -32.84]| <0.001| 
SO 0.15 [-4.60, 4.91] 0.949 

1TYP - - - 

cs [09] os, aa] eno] 
1CFA 1.62 [-2.19, 5.42]| 0.405 

ALLO - - - 


| 
wz 


| NP Lint |___16.51] (19.56, -13.46]| <0.001] 
TD:AP+SO - = - 

C BUAP | 531| 9.35, -1.27]] 0.01) 
AO:HA | TD+BU:AP -3.13 -7.38, 1.12 0.15 
BU:SO 0.11 -3.92, 4.15 0.956 

TD+BU:SO -0.08 -4.33, 4.17 0.97 
TD:ALLO = = - 
TD+BU:CLAS -4.07 -8.32, 0.19 0.06 
TD+BU:SMUS -2.52 -6.77, 1.74 0.247 
TD:POL - - - 

AOND [BUNT [8.04] m7 m33] <00 
TD+BU:INT 2.35 -1.12, 5.82 0.185 
AP+SO:1TYP - - - 

[| AP:ICFA | 7.01] (2.83, 11.17] <0.001) 

HA:CS AP:Cl 3.38 -0.79, 7.54 0.112 
SO:Cl -0.20 -4.37, 3.96 0.924 

SO:1CFA -0.21 -4.37, 3.95 0.921 
AP+SO:ALLO =- = - 

[ AP:CLAS [9.55] (5.37, 13.71][ <0.001) 

HA:OR | AP:SMUS | 6.25| [2.08, 10.42]| <0.001| 
SO:SMUS 0.07 -4.09, 4.24 0.973 

SO:CLAS -0.43 -4.59, 3.73 0.839 
AP+SO:POL = z = 

HAND C APINT [604] [353,1030 <0-001] 
SO:INT 0.08 -3.32, 3.48 0.964 
ITYP:ALLO p E - 

[C CECLAS [4-76] 10.59, 8.93] 0.025] 

CS:OR Cl:SMUS 4.02 -0.15, 8.18 0.05 
1CFA:CLAS -3.09 -7.25, 1.08 0.147 
ICFA:SMUS -0.52 -4.68, 3.64 0.807 


leave the configuration out of the regression, but doing so would underrepresent 
the important negative contribution to performance. 

In the top part of the table, the first column shows the independent vari- 
ables and the second column shows a setting. One of the settings, identified by 
dashes in the remaining columns, is the baseline in the regression. We use the 
following settings as baselines: TD, AP+SO, 1TYP, ALLO, and POL. We chose 
the baseline according to what we expected to be the most precise settings. For 
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Table 4. Model of timeout in terms of analysis configuration options (Table 1). 
Independent variables for individual programs not shown. R? of 0.77. 


Option |Setting|Coef. CI Exp(coef.)|p-value 
TD z : z 2 

ao [BU an 2.04, -0.93]] 0228] 0-001) 
TD+BU| 0.09 [-0.46, 0.65] 1.09 0.73 

AP+SO - - - - 

HA [| AP | -10.6] [-12.29, -9.05]| _ 2.49E-5| <0.001] 
SO 0.03 [-0.46, 0.53] 1.03 0.899 

ITYP 3 - - - 


the other settings, the third column shows the estimated effect of that setting 
with all other settings (including the choice of program, each an independent 
variable) held fixed. For example, the fifth row of the table shows that AP (only) 
decreases overall analysis time by 37.6 min compared to AP+SO (and the other 
baseline settings). The fourth column shows the 95% confidence interval around 
the estimate, and the last column shows the p-value. As is standard, we consider 
p-values less than 0.05 (5%) significant; such rows are highlighted green. 

The bottom part of the table shows the additional effects of two-way combi- 
nations of options compared to the baseline effects of each option. For example, 
the BU:CLAS row shows a coefficient of —8.87. We add this to the individual 
effects of BU (—1.98) and CLAS (11.0) to compute that BU:CLAS is 21.9 min 
faster (since the number is negative) than the baseline pair of TD:ALLO. Not 
all interactions are shown, e.g., AO:CS is not in the table. Any interactions not 
included were deemed not to have meaningful effect and thus were dropped by 
the model generation process [12]. 

Setting the running time of a timed-out configuration as one hour in Table 3 
may under-report a configuration’s (negative) performance impact. For a more 
complete view, we follow the suggestion of Arcuri and Briand [3], and construct a 
model of success/failure using logistic regression. We consider “if a configuration 
timed out” as the categorical dependent variable, and the analysis configuration 
options and the benchmark programs as independent variables. 

Table 4 summarizes our logistic regression model for timeout. The coefficients 
in the third column represent the change in log likelihood associated with each 
configuration setting, compared to the baseline setting. Negative coefficients indi- 
cate lower likelihood of timeout. The exponential of the coefficient, Exp(coef) in 
the fifth column, indicates roughly how strongly that configuration setting being 
turned on affects the likelihood relative to the baseline setting. For example, the 
third row of the table shows that BU is roughly 5 times less likely to time out 
compared to TD, a significant factor to the model. 

Tables 3 and 4 present several interesting performance trends. 
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Summary Objects Incur a Significant Slowdown. Use of summary objects results 
in a very large slowdown, with high significance. We can see this in the AP 
row in Table3. It indicates that using only AP results in an average 37.6-min 
speedup compared to the baseline AP+SO (while SO only had no significant 
difference from the baseline). We observed a similar trend in Table 4; use of 
summary objects has the largest effect, with high significance, on the likelihood 
of timeout. Indeed, 624 out of the 667 analyses that timed out had summary 
objects enabled (i.e., SO or AP+SO). We investigated further and found the 
slowdown from summary objects is mostly due to significantly larger number of 
dimensions included in the abstract state. For example, analyzing jython with 
AP-TD-CI-ALLO-INT has, on average, 11 numeric variables when analyzing a 
method, and the whole analysis finished in 15 min. Switching AP to SO resulted 
in, on average, 1473 variables per analyzed method and the analysis ultimately 
timed out. 


The Polyhedral Domain is Slow, But Not as Slow as Summary Objects. Choosing 
INT over baseline POL nets a speedup of 16.51 min. This is the second-largest 
performance effect with high significance, though it is half as large as the effect 
of SO. Moreover, per Table 4, turning on POL is more likely to result in timeout; 
409 out of 667 analyses that timed out used POL. 


Heavyweight CS and OR Settings Hurt Performance, Particularly When Using 
Summary Objects. For CS settings, Cl is faster than baseline 1TYP by 7.1 min, 
while there is not a statistically significant difference with 1CFA. For the OR 
settings, we see that the more lightweight representations CLAS and SMUS are 
faster than baseline ALLO by 11.00 and 7.15 min, respectively, when using base- 
line AP+SO. This makes sense because these representations have a direct effect 
on reducing the number of summary objects. Indeed, when summary objects are 
disabled, the performance benefit disappears: AP:CLAS and AP:SMUS add back 
9.55 and 6.25 min, respectively. 


Bottom-up Analysis Provides No Substantial Performance Advantage. Table 4 
indicates that a BU analysis is less likely to time out than a TD analysis. How- 
ever, the performance model in Table 3 does not show a performance advantage 
of bottom-up analysis: neither BU nor TD+BU provide a statistically significant 
impact on running time over baseline TD. Setting one hour for the configura- 
tions that timed out in the performance model might fail to capture the negative 
performance of top-down analysis. This observation underpins the utility of con- 
structing a success/failure analysis to complement the performance model. In any 
case, we might have expected bottom-up analysis to provide a real performance 
advantage (Sect.4.1), but that is not what we have observed. 


6.3 RQ2: Precision 


Table 5 summarizes our regression model for precision, using the same format as 
Table 3. We measure precision as the number of array indexes proven to be in 
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Table 5. Model of precision, measured as # of array indexes proved in bounds, 
in terms of analysis configuration options (Table 1), including two-way interactions. 
Independent variables for individual programs not shown. R? of 0.98. 


Option Setting Est. (#) CI p-value 
TD z g 3 


66.47, 55.99] 
ALLO - - - 
or aas oos Esso ASN 0.001] 
SMUS 35.47 [-14.72, 85.67] 0.166 
ND POL - - - 
INT 5.11 [-28.77, 38.99] 0.767 
TD:AP+SO 2 = = 


TD:ALLO - - - 


AO:OR 


BU:SMUS 77.69, 19.37] 
TD+BU:SMUS| -29.25|  [-79.23, 20.72]| 0.251 
AP+SO:ALLO E 


HA:OR 


AP:SMUS 67.20, 33.44] 
AP:CLAS -8.81|  [-57.84, 40.20]| 0.724 
AP-+SO:POL 7 = 


bounds. As recommended by Arcuri and Briand [3], we omit from the regression 
those configurations that timed out.” We see several interesting trends. 


Access Paths are Critical to Precision. Removing access paths from the config- 
uration, by switching from AP+SO to SO, yields significantly lower precision. 
We see this in the SO (only) row in the table, and in all of its interactions (i.e., 
SO:opt and opt:SO rows). In contrast, AP on its own is not statistically worse 
than AP+SO, indicating that summary objects often add little precision. This 
is unfortunate, given their high performance cost. 


Bottom-up Analysis Harms Precision Overall, Especially for SO (Only). BU has 
a strongly negative effect on precision: 129.98 fewer checks compared to TD. 
Coupled with SO it fares even worse: BU:SO nets 686.79 fewer checks, and 
TD+BU:SO nets 630.99 fewer. For example, for xalan the most precise configura- 
tion, which uses TD and AP+SO, discharges 981 checks, while all configurations 


5 The alternative of setting precision to be 0 would misrepresent the general power of 
a configuration, particularly when combined with runs that did not time out. Fewer 
runs might reduce statistical power, however, which is captured in the model. 
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that instead use BU and SO on xalan discharge close to zero checks. The same 
basic trend holds for just about every program. 


The Relational Domain Only Slightly Improves Precision. The row for INT is 
not statistically different from the baseline POL. This is a bit of a surprise, since 
by itself POL is strictly more precise than INT. In fact, it does improve preci- 
sion empirically when coupled with either AP or SO—the interaction AP:INT 
and SO:INT reduces the number of checks. This sets up an interesting perfor- 
mance tradeoff that we explore in Sect. 6.4: using AP+SO with INT vs. using AP 
with POL. 


More Precise Abstract Object Representation Improves Precision, But Context 
Sensitivity Does Not. The table shows CLAS discharges 90.15 fewer checks com- 
pared to ALLO. Examining the data in detail, we found this occurred because 
CLAS conflates all arrays of the same type as one abstract object, thus impre- 
cisely approximating those arrays’ lengths, in turn causing some checks to fail. 
Also notice that context sensitivity (CS) does not appear in the model, mean- 
ing it does not significantly increase or decrease the precision of array bounds 
checking. This is interesting, because context-sensitivity is known to reduce 
points-to set size [35,49] (thus yielding more precise alias checks and dispatch 
targets). However, for our application this improvement has minimal impact. 


6.4 RQ3: Tradeoffs 


Finally, we examine how analysis settings affect the tradeoff between precision 
and performance. To begin out discussion, recall Table 2 (page 12), which shows 
the fastest configuration and the most precise configuration for each benchmark. 
Further, the table shows the configurations’ running time, number of checks 
discharged, and percentage of checks discharged. 

We see several interesting patterns in this table, though note the table shows 
just two data points and not the full distribution. First, the configurations in 
each column are remarkably consistent. The fastest configurations are all of 
the form BU-AP-CI-*-INT, only varying in the abstract object representation. 
The most precise configurations are more variable, but all include TD and some 
form of AP. The rest of the options differ somewhat, with different forms of 
precision benefiting different benchmarks. Finally, notice that, overall, the fastest 
configurations are much faster than the most precise configurations—often by 
an order of magnitude—but they are not that much less precise—typically by 
5-10% points. 

To delve further into the tradeoff, we examine, for each program, the overall 
performance and precision distribution for the analysis configurations, focusing 
on particular options (HA, AO, etc.). As settings of option HA have come up 
prominently in our discussion so far, we start with it and then move through 
the other options. Figure 1 gives per-benchmark scatter plots of this data. Each 
plotted point corresponds to one configuration, with its performance on the x- 
axis and number of discharged array bounds checks on the y-axis. We regard a 
configuration that times out as discharging no checks, so it is plotted at (60, 0). 
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Fig. 1. Tradeoffs: AP vs. SO vs. AP+SO. 
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Fig. 2. Tradeoffs: TD vs. BU vs. TD+BU. 
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Fig. 3. Tradeoffs: ALLO vs. SMUS vs. CLAS. 


The shape of a point indicates the HA setting of the corresponding configuration: 
black circle for AP, red triangle for AP+SO, and blue cross for SO. 


As a general trend, we see that access paths improve precision and do little to 
harm performance; they should always be enabled. More specifically, configura- 
tions using AP and AP+SO (when they do not time out) are always toward the 
top of the graph, meaning good precision. Moreover, the performance profile of 
SO and AP+SO is quite similar, as evidenced by related clusters in the graphs 
differing in the y-axis, but not the x-axis. In only one case did AP+SO time out 
when SO alone did not.’ 

On the flip side, summary objects are a significant performance bottleneck for 
a small boost in precision. On the graphs, we can see that the black AP circles 
are often among the most precise, while AP+SO tend to be the best (8/11 cases 
in Table2). But AP are much faster. For example, for bloat, chart, and jython, 
only AP configurations complete before the timeout, and for pmd, all but four 
of the configurations that completed use AP. 


Top-Down Analysis is Preferred: Bottom-up is less precise and does little to 
improve performance. Figure 2 shows a scatter plot of the precision/performance 
behavior of all configurations, distinguishing those with BU (black circles), TD 
(red triangles), and TD+BU (blue crosses). Here the trend is not as stark as 
with HA, but we can see that the mass of TD points is towards the upper- 
left of the plots, except for some timeouts, while BU and TD+BU have more 
configurations at the bottom, with low precision. By comparing the same (x,y) 
coordinate on a graph in this figure with the corresponding graph in the previous 
one, we can see options interacting. Observe that the cluster of black circles 
at the lower left for antlr in Fig.2(a) correspond to SO-only configurations in 
Fig. 1(a), thus illustrating the strong negative interaction on precision of BU:SO 
we discussed in the previous subsection. The figures (and Table 2) also show that 
the best-performing configurations involve bottom-up analysis, but usually the 


6 In particular, for eclipse, configuration TD+BU-SO-1CFA-ALLO-POL finished at 
59 min, while TD+-BU-AP+SO-1CFA-ALLO-POL timed out. 
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Fig. 4. Tradeoffs: INT vs. POL. 
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benefit is inconsistent and very small. And TD+BU does not seem to balance 
the precision/performance tradeoff particularly well. 


Precise Object Representation Often Helps with Precision at a Modest Cost to 
Performance. Figure 3 shows a representative sample of scatter plots illustrating 
the tradeoff between ALLO, CLAS, and SMUS. In general, we see that the highest 
points tend to be ALLO, and these are more to the right of CLAS and SMUS. On 
the other hand, the precision gain of ALLO tends to be modest, and these usu- 
ally occur (examining individual runs) when combining with AP+SO. However, 
summary objects and ALLO together greatly increase the risk of timeouts and 
low performance. For example, for eclipse the row of circles across the bottom 
are all SO-only. 


The Precision Gains of POLY are More Modest than Gains Due to Using 
AP+SO (over AP). Figure 4 shows scatter plots comparing INT and POLY. We 
investigated several groupings in more detail and found an interesting interac- 
tion between the numeric domain and the heap abstraction: POLY is often better 
than INT for AP (only). For example, the points in the upper left of bloat use AP, 
and POLY is slightly better than INT. The same phenomenon occurs in luindex in 
the cluster of triangles and circles to the upper left. But INT does better further 
up and to the right in luindex. This is because these configurations use AP+SO, 
which times out when POLY is enabled. A similar phenomenon occurs for the two 
points in the upper right of pmd, and the most precise points for hsqldb. Indeed, 
when a configuration with AP+SO-INT terminates, it will be more precise than 
those with AP-POLY, but is likely slower. We manually inspected the cases where 
AP-+SO-INT is more precise than AP-POLY, and found that it mostly is because 
of the limitation that access paths are dropped through method calls. AP+SO 
rarely terminates when coupled with POLY because of the very large number of 
dimensions added by summary objects. 


7 Related Work 


Our numeric analysis is novel in its focus on fully automatically identifying 
numeric invariants in real (heap-manipulating, method-calling) Java programs, 
while aiming to be sound. We know of no prior work that carefully studies 
precision and performance tradeoffs in this setting. Prior work tends to be much 
more imprecise and/or intentionally unsound, but scale better, or more precise, 
but not scale to programs as large as those in the DaCapo benchmark suite. 


Numeric vs. Heap Analysis. Many abstract interpretation-based analyses focus 
on numeric properties or heap properties, but not both. For example, Calcagno 
et al. [13] uses separation logic to create a compositional, bottom-up heap anal- 
ysis. Their client analysis for Java checks for NULL pointers [1], but not out-of- 
bounds array indexes. Conversely, the PAGAI analyzer [31] for LLVM explores 
abstract interpretation algorithms for precise invariants of numeric variables, but 
ignores the heap (soundly treating heap locations as T). 
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Numeric Analysis in Heap-Manipulating Programs. Fu [25] first proposed the 
basic summary object heap abstraction we explore in this paper. The approach 
uses a points-to analysis [44] as the basis of generating abstract names for sum- 
mary objects that are weakly updated [27]. The approach does not support 
strong updates to heap objects and ignores procedure calls, making unsound 
assumptions about effects of calls to or from the procedure being analyzed. Fu’s 
evaluation on DaCapo only considered how often the analysis yields a non-T 
field, while ours considers how often the analysis can prove that an array index 
is in bounds, which is a more direct measure of utility. Our experiments strongly 
suggest that when modeled soundly and at scale, summary objects add enormous 
performance overhead while doing much less to assist precision when compared 
to strongly updatable access paths alone [21,52]. 

Some prior work focuses on inferring precise invariants about heap-allocated 
objects, e.g., relating the presence of an object in a collection to the value of 
one of the object’s fields. Ferrera et al. [23,24] also propose a composed anal- 
ysis for numeric properties of heap manipulating programs. Their approach is 
amenable to both points-to and shape analyses (e.g., TVLA [34]), supporting 
strong updates for the latter. DESKCHECK [39] and Chang and Rival [14, 15] also 
aim to combine shape analysis and numeric analysis, in both cases requiring the 
analyst to specify predicates about the data structures of interest. Magill [37] 
automatically converts heap-manipulating programs into integer programs such 
that proving a numeric property of the latter implies a numeric shape property 
(e.g., a list’s length) of the former. The systems just described support more 
precise invariants than our approach, but are less general or scalable: they tend 
to focus on much smaller programs, they do not support important language fea- 
tures (e.g., Ferrara’s approach lacks procedures, DESKCHECK lacks loops), and 
may require manual annotation. 

Clousot [22] also aims to check numeric invariants on real programs that use 
the heap. Methods are analyzed in isolation but require programmer-specified 
pre/post conditions and object invariants. In contrast, our interprocedural anal- 
ysis is fully automated, requiring no annotations. Clousot’s heap analysis makes 
local, optimistic (and unsound) assumptions about aliasing,’ while our approach 
aims to be sound by using a global points-to analysis. 


Measuring Analysis Parameter Tradeoffs. We are not aware of work explor- 
ing performance/precision tradeoffs of features in realistic abstract interpreters. 
Oftentimes, papers leave out important algorithmic details. The initial ASTREE 
paper [7] contains a wealth of ideas, but does not evaluate them systemati- 
cally, instead reporting anecdotal observations about their particular analysis 
targets. More often, papers focus on one element of an analysis to evaluate, e.g., 
Logozzo [36] examines precision and performance tradeoffs useful for certain 
kinds of numeric analyses, and Ferrara [24] evaluates his technique using both 
intervals and octagons as the numeric domain. Regarding the latter, our paper 
shows that interactions with the heap abstraction can have a strong impact on 


T Interestingly, Clousot’s assumptions often, but not always, lead to sound results [16]. 
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the numeric domain precision/performance tradeoff. Prior work by Smaragdakis 
et al. [49] investigates the performance/precision tradeoffs of various implemen- 
tation decisions in points-to analysis. PADDLE [35] evaluates tradeoffs among 
different abstractions of heap allocation sites in a points-to analysis, but specif- 
ically only evaluates the heap analysis and not other analyses that use it. 


8 Conclusion and Future Work 


We presented a family of static numeric analyses for Java. These analyses imple- 
ment a novel combination of techniques to handle method calls, heap-allocated 
objects, and numeric analysis. We ran the 162 resulting analysis configurations 
on the DaCapo benchmark suite, and measured performance and precision in 
proving array indexes in bounds. Using a combination of multiple linear regres- 
sion and data visualization, we found several trends. Among others, we discov- 
ered that strongly updatable access paths are always a good idea, adding sig- 
nificant precision at very little performance cost. We also found that top-down 
analysis also tended to improve precision at little cost, compared to bottom-up 
analysis. On the other hand, while summary objects did add precision when 
combined with access paths, they also added significant performance overhead, 
often resulting in timeouts. The polyhedral numeric domain improved precision, 
but would time out when using a richer heap abstraction; intervals and a richer 
heap would work better. 

The results of our study suggest several directions for future work. For 
example, for many programs, a much more expensive analysis often did not 
add much more in terms of precision; a pre-analysis that identifies the tradeoff 
would be worthwhile. Another direction is to investigate a more sparse repre- 
sentation of summary objects that retains their modest precision benefits, but 
avoids the overall blowup. We also plan to consider other analysis configuration 
options. Our current implementation uses an ahead-of-time points-to analysis to 
model the heap; an alternative solution is to analyze the heap along with the 
numeric analysis [43]. Concerning abstract object representation and context 
sensitivity, there are other potentially interesting choices, e.g., recency abstrac- 
tion [5] and object sensitivity [40]. Other interesting dimensions to consider are 
field sensitivity [32] and widening, notably widening with thresholds. Finally, we 
plan to explore other effective ways to design hybrid top-down and bottom-up 
analysis [54], and investigate sparse inter-procedural analysis for better perfor- 
mance [42]. 
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Abstract. Data science software plays an increasingly important role 
in critical decision making in fields ranging from economy and finance 
to biology and medicine. As a result, errors in data science applications 
can have severe consequences, especially when they lead to results that 
look plausible, but are incorrect. A common cause of such errors is when 
applications erroneously ignore some of their input data, for instance due 
to bugs in the code that reads, filters, or clusters it. 

In this paper, we propose an abstract interpretation framework to 
automatically detect unused input data. We derive a program semantics 
that precisely captures data usage by abstraction of the program’s oper- 
ational trace semantics and express it in a constructive fixpoint form. 
Based on this semantics, we systematically derive static analyses that 
automatically detect unused input data by fixpoint approximation. 

This clear design principle provides a framework that subsumes exist- 
ing analyses; we show that secure information flow analyses and a form of 
live variables analysis can be used for data usage, with varying degrees 
of precision. Additionally, we derive a static analysis to detect single 
unused data inputs, which is similar to dependency analyses used in the 
context of backward program slicing. Finally, we demonstrate the value 
of expressing such analyses as abstract interpretation by combining them 
with an existing abstraction of compound data structures such as arrays 
and lists to detect unused chunks of the data. 


1 Introduction 


In the past few years, data science has grown considerably in importance and 
now heavily influences many domains, ranging from economy and finance to 
biology and medicine. As we rely more and more on data science for making 
decisions, we become increasingly vulnerable to programming errors. 
Programming errors can cause frustration, especially when they lead to a 
program failure after hours of computation. However, programming errors that 
do not cause failures can have more serious consequences as code that produces 
an erroneous but plausible result gives no indication that something went wrong. 
A notable example is the paper “Growth in a Time of Debt” published in 2010 by 
economists Reinhart and Rogoff, which was widely cited in political debates and 
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Fig. 1. Overview of the program semantics presented in the paper. The dependency 
semantics, derived by abstraction of the trace semantics, is sound and complete for 
data usage. Further sound but not complete abstractions are shown on the right. 


was later demonstrated to be flawed. Notably, one of the flaws was a program- 
ming error, which entirely excluded some data from the analysis [23]. Its critics 
hold that this paper led to unjustified adoption of austerity policies for coun- 
tries with various levels of public debt [30]. Programming errors in data analysis 
code for medical applications are even more critical [27]. It is thus paramount 
to achieve a high level of confidence in the correctness of data science code. 

The likelihood that a programming error causes some input data to remain 
unused is particularly high for data science applications, where data goes through 
long pipelines of modules that acquire, filter, merge, and manipulate it. In this 
paper, we propose an abstract interpretation [14] framework to automatically 
detect unused input data. We characterize when a program uses (some of) its 
input data using the notion of dependency between the input data and the out- 
come of the program. Our notion of dependency accounts for non-determinism 
and non-termination. Thus, it encompasses notions of dependency that arise in 
many different contexts, such as secure information flow and program slicing [1], 
as well as provenance or lineage analysis [9], to name a few. 

Following the theory of abstract interpretation [12], we systematically derive 
a new program semantics that precisely captures exactly the information needed 
to reason about input data usage, abstracting away from irrelevant details about 
the program behavior. Figure 1 gives an overview of our approach. The seman- 
tics is first expressed in a constructive fixpoint form over sets of sets of traces, 
by partitioning the operational trace semantics of a program based on its out- 
come (cf. outcome semantics in Fig.1), and a further abstraction ignores inter- 
mediate state computations (cf. dependency semantics in Fig. 1). Starting the 
development of the semantics from the operational trace semantics enables a 
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uniform mathematical reasoning about programs semantics and program prop- 
erties (Sect. 3). In particular, since input data usage is not a trace property or a 
subset-closed property [11] (Sect. 4), we show that a formulation of the semantics 
using sets of sets of traces is necessary for a sound validation of input data usage 
via fixpoint approximation [28]. 

This clear design principle provides a unifying framework for reasoning about 
existing analyses based on dependencies. We survey existing analyses and iden- 
tify key design decisions that limit or facilitate their applicability to input data 
usage, and we assess their precision. We show that non-interference analyses [6] 
are sound for proving that a terminating program does not use any of its input 
data; although this is too strong a property in general. We prove that strongly 
live variable analysis [20] is sound for data usage even for non-terminating pro- 
grams, albeit it is imprecise with respect to implicit dependencies between pro- 
gram variables. We then derive a more precise static analysis similar to depen- 
dency analyses used in the context of backward program slicing [37]. Finally, we 
demonstrate the value of expressing these analyses as abstract interpretations 
by combining them with an existing abstraction of compound data structures 
such as arrays and lists [16]. This allows us to detect unused chunks of the input 
data, and thus apply our work to realistic data science applications. 


2 Trace Semantics 


The semantics of a program is a mathematical characterization of its behavior 
when executed for all possible input data. We model the operational semantics 
of a program as a transition system (X,T) where X is a (potentially infinite) set 
of program states and the transition relation 7 C X x X describes the possible 
transitions between states [12,14]. Note that this model allows representing pro- 
grams with (possibly unbounded) non-determinism. The set 2 = {sE X|YS E 


X : (s, s") € T} is the set of final states of the program. 


In the following, let X” = {so-++Sn—1 | Vi<n:s; E€ X} be the set of 


all sequences of exactly n program states. We write € to denote the empty 


o def {e}. Let X* dei Unen X” be the set of all finite sequences, 


y+ $ yx \ Z° be the set of all non-empty finite sequences, ©” be the set 


of all infinite sequences, X+% “ 5+ U SY be the set of all non-empty finite 


or infinite sequences and 3/*°° de! yx U SY be the set of all finite or infi- 


nite sequences of program states. In the following, we write oo’ for the con- 


catenation of two sequences o,o’ E€ X** (with oe = eo = a, and oo’ = o 


when ø € 5”), T+ & T n St and TY L T A 5” for the selection of the 


non-empty finite sequences and the infinite sequences of T € P (X*®), and 
T ; T = {oso'|s € XAos ET Aso € T'} for the merging of two sets of 
sequences T € P (X+) and T’ € P (X+), when a finite sequence in T termi- 
nates with the initial state of a sequence in T”. 

Given a transition system (X, T), a trace is a non-empty sequence of program 
states described by the transition relation 7, that is, (s,s’) € 7 for each pair of 


sequence, i.e., X 
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n={@\uf7#}u{-t.* waa 


Fig. 2. First fixpoint iterates of the trace semantics A. 


consecutive states s,s’ € X in the sequence. The set of final states 2 and the 
transition relation 7 can be understood as sets of traces of length one and length 
two, respectively. The trace semantics A € P (X+) generated by a transition 
system (57,7) is the union of all finite traces that are terminating with a final 
state in (2, and all infinite traces. It can be expressed as a least fixpoint in the 
complete lattice (P (X+), c, U, n, 2”, X+) [12]: 


A=lfp= @ 


O(T) eau (7; T) i 


where the computational order is Ti E To de T" C T A TY > Ts. Figure2 
illustrates the first fixpoint iterates. The fixpoint iteration starts from the set 
of all infinite sequences of program states. At each iteration, the final program 
states in 2 are added to the set, and sequences already in the set are extended 
by prepending transitions to them. In this way, we add increasingly longer finite 
traces, and we remove infinite sequences of states with increasingly longer pre- 
fixes not forming traces. In particular, the i-th iterate builds all finite traces of 
length less than or equal to i, and selects all infinite sequences whose prefixes 
of length i form traces. At the limit we obtain all infinite traces and all finite 
traces that terminate in a final state in 2. Note that A is suffix-closed. 

The trace semantics A fully describes the behavior of a program. However, to 
reason about a particular property of a program, it is not necessary to consider 
all aspects of its behavior. In fact, reasoning is facilitated by the design of a 
semantics that abstracts away from irrelevant details about program executions. 
In the next sections, we define our property of interest and use abstract inter- 
pretation [14] to systematically derive, by successive abstractions of the trace 
semantics, a semantics that precisely captures such a property. 


3 Input Data Usage 


A property is specified by its extension, that is, the set of elements having such a 
property [14,15]. Thus, properties of program traces in X+% are sets of traces in 
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P (Xt), and properties of programs with trace semantics in P (Xt) are sets 
of sets of traces in P (P (X7%®)). Accordingly, a program P satisfies a property 
H € P (P (X+%)) if and only if its semantics |P] € P (X+) belongs to H: 


PHHS[P]EH (2) 


Some program properties are defined in terms of individual program traces 
and can be equivalently expressed as trace properties. This is the case for the 
traditional safety [26] and liveness [4] properties of programs. In such a case, a 
program P satisfies a trace property T if and only if all traces in its semantics 
[P] belong to the property: P = T & [P] CT. 

Program properties that establish a relation between different program traces 
cannot be expressed as trace properties [11]. Examples are security properties 
such as non-interference [21,35]. In this paper, we consider a closely related but 
more general property called input data usage, which expresses that the outcome 
of a program does not depend on (some of) its input data. The notion of outcome 
accounts for non-determinism as well as non-termination. Thus, our notion of 
dependency encompasses non-interference as well as notions of dependency that 
arise in many other contexts [1,9]. We further explore this in Sects. 8 to 10. 

Let each program P with trace semantics [|P] have a set Ip of input variables 
and a set Op of output variables’. For simplicity, we can assume that these 
variables are all of the same type (e.g., boolean variables) and their values are 
all in a set V of possible values (e.g., V = {T,F} where T is the boolean value 
true and F is the boolean value false). Given a trace ø € [P], we write o[0] to 
denote its initial state and o[w] to denote its outcome, that is, its final state if 
the trace is finite or L if the trace is infinite. The input variables at the initial 
states of the traces of a program store the values of its input data: we write 
o[0](z) to denote the value of the input data stored in the input variable į at the 
initial state of the trace ø, and c1[0] 4; o2[0] to denote that the initial states 
of two traces c1 and g2 disagree on the value of the input variable i but agree 
on the values of all other variables. The output variables at the final states of 
the finite traces of a program store its result: we write o[w](o) to denote the 
result stored in the output variable o at the final state of a finite trace ø. We 
can now formally define when an input variable i € Ip is unused with respect to 
a program with trace semantics |P] € P (X+): 


UNUSED,([P]) $ Vo € [P], v € V: of) Av > 


3 , i . ! (3) 
do’ € [P]: o'[0] A: o[0] A o' [0] (4) = v A olw] = o'l] 


Intuitively, an input variable 7 is unused if all feasible program outcomes (e.g., 
the outcome o|w] of a trace ø) are feasible from all possible initial values of i 
(i.e., for all possible initial values v of į that differ from the initial value of i 
in o, there exists a trace with initial value v for i that has the same outcome 
o[w]). In other words, the outcome of the program is the same independently of 


1 The approach can be easily extended to infinite inputs and/or outputs via abstrac- 
tions such as the one later presented in Sect. 11. 
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english = input () 
math = input () 
science = input () 
bonus = input () 


passing = True 

if not english: english = False 
s if not math: passing = bonus 
) if not math: passing = bonus 


ı print (passing) 


Fig. 3. Simple program to check if a student has passed three school subjects. The 
programmer has made two mistakes at line 7 and at line 9, which cause the input data 
stored in the variables english and science to be unused. 


the initial value of the input variable 7. Note that this definition accounts for 
non-determinism (since it considers each program outcome independently) and 
non-termination (since a program outcome can be L). 


Example 1. Let us consider the simple program P in Fig. 3. Based on the input 
variables english, math, and science (cf. lines 1-3), the program is supposed 
to check if a student has passed all three considered school subjects and store 
the result in the output variable passing (cf. line 11). For mathematics and 
science, the student is allowed a bonus based on the input variable bonus (cf. 
line 8 and 9). However, the programmer has made two mistakes at line 7 and at 
line 9, which cause the input variables english and science to be unused. 

Let us now consider the input variable science. The trace semantics of the 
program (simplified to consider only the variables science and passing) is: 


[P] science = {(T_-)... (TT); (T_)... (TF), (F_)... (ET); (F_)...(FF)} 


where each state (v1 v2) shows the boolean value vı of science and v2 of passing, 
and _ denotes any boolean value. We omitted the trace suffixes for brevity. The 
input variable science is unused, since each result value (T or F) for passing is 
feasible from all possible initial values of science. Note that all other outcomes 
of the program (i.e., non-termination) are not feasible. 

Let us now consider the input variable math. The trace semantics of the 
program (now simplified to only consider math and passing) is the following: 


[Puan = {(T-) -+ - (TT), (F.)... (FT), (F.)...(FF)} 


In this case, the input variable math is used since only the initial state (F_) yields 
the result value F for passing (in the final state (FF)). a 


The input data usage property M can now be formally defined as follows: 
N & {[P] € P (Zt) | vi € Ip: UNUSED: ([P])} (4) 


which states that the outcome of a program does not depend on any input data. 
In practice one is interested in weaker input data usage properties for a subset 
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J of the input variables, i.e., My © {[P] € P(Zt%) | Vi € J C Ip: UNUSED; 


([PI)}- 

In the following, we use abstract interpretation to reason about input data 
usage. In the next section, we discuss the challenges to the application of the 
standard abstract interpretation framework that emerge from the fact that input 
data usage cannot be expressed as a trace property. 


4 Sound Input Data Usage Validation 


In the standard framework of abstract interpretation, one defines a semantics 
that precisely captures a property S of interest by abstraction of the trace seman- 
tics A [12]. Then, further abstractions A! provide sound over-approximations 
7(A*) of A (by means of a concretization function y): A C 7(A*). For a trace 
property, an over-approximation ([P]*) of the semantics [P] of a program P 
allows a sound validation of the property: since [P] C 7([P]*), we have that 
y([P]*) € S > [P] C S and so, if 7([P]*) © S, we can conclude that P H S 
(cf. Sect.3). This conclusion is also valid for all other subset-closed properties 
[11]: since by definition ([P]') € S > VT C 4([P]'): T € S, we have that 
7([P]*) € S = [P] € S (and so we can conclude that P H S). However, for pro- 
gram properties that are not subset-closed, we have that y([P]*) € S # [P] € S 
[28] and so we cannot conclude that P } S, even if y([P]*) € S (cf. Eq. 2). 

We have seen in the previous section that input data usage is not a trace 
property. The example below shows that it is not a subset-closed property either. 


Example 2. Let us consider again the program P and its semantics [P] science 
and [P]matn Shown in Example 1. We have seen in Example 1 that the semantics 
[P] science belongs to the data usage property NV: [P]science € M. Let us consider 
now the following subset T of [P]science: 


T ={(T_)...(TT), (F_)... (FT), (F_)...(FF)} 


In this case, the input variable science is used. Indeed, we can observe that T 
coincides with [P]matn (except for the considered input variable). Thus T ¢ M 
even though T C [P] science: | 


Since input data usage is not subset-closed, we are in the unfortunate sit- 
uation that we cannot use the standard abstract interpretation framework to 
soundly prove that a program does not use (some of) its input data using an 
over-approximation of the semantics of the program: y([P]*) € Ng Æ [P] € No. 

We solve this problem in the next section, by lifting the trace semantics 
[P] € P(t) of a program P (i.e., a set of traces) to a set of sets of traces 
(P) € P (P (X+%)) [28]. In this setting, a program P satisfies a property H if 
and only if its semantics (P) is a subset of H: 


PEHS(P)CH (5) 
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As we will explain in the next section, now an over-approximation y((P))*) of 
(P) allows again a sound validation of the property: since (P) C >((P)*), we 
have that >((P)°) C H = (P) C H (and so we can conclude that P - H). 

More specifically, in the next section, we define a program semantics (P) that 
precisely captures which subset J of the input variables is unused by a program 
P. In later sections, we present further abstractions (P)* that over-approximate 
the subset of the input variables that may be used by P, and thus allows a sound 
validation of an under-approzimation J? of J: y((P)*) C Ny: => (P) C Nos. In 
other words, this means that every input variable reported as unused by an 
abstraction is indeed not used by the program. 


5 Outcome Semantics 


We lift the trace semantics A to a set of sets of traces by partitioning. The 
partitioning abstraction ag: P(L**) = P (P (X+%®)) of a set of traces T is: 


ag(T) © {TNC | CEQ} (6) 


where Q € P(P(L7T™)) is a partition of sequences of program states. 

More specifically, to reason about input data usage of a program P, we lift 
the trace semantics |P] to (P) by partitioning it into sets of traces that yield the 
same program outcome. The key insight behind this idea is that, given an input 
variable 7, the initial states of all traces in a partition give all initial values for i 
that yield a program outcome; the variable 7 is unused if and only if these initial 
values are all the possible values for i (or the set of values is empty because the 
outcome is unfeasible, cf. Eq. 3). Thus, if the trace semantics |P] of a program 
P belongs to the input data usage property My, then each partition in (P) must 
also belong to Nj, and vice versa: we have that |P] € Nj = (P) C Nz, which 
is precisely what we want (cf. Eq. 5). 

Let T}, denote the subset of the finite sequences of program states in T € 


P (+) with value v for the output variable o in their outcome (i.e., their 


final state): TŁ, = {0 € Tt | o[w](o) =v}. We define the outcome partition 


O € P (P (X%®)) of sequences of program states: 


OF {OF oso, | Ulree ste EVE ULE} 
where V is the set of possible values of the output variables 01,..., 0% (cf. Sect. 3). 
The partition contains all sets of finite sequences that agree on the values of the 
output variables in their outcome, and all infinite sequences of program states 
(i.e., all sequences with outcome L). We instantiate ag above with the outcome 
partition to obtain the outcome abstraction a,: P (X+) = P (P (X+%®)): 


def W 
a,.(T) = ie ipsa | v1, ..., Uk E€ V} U {TĦ} (7) 
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Example 3. The program P of Example 1 has only one output variable passing 
with boolean value T or F. Let us consider again the trace semantics [P]matn 
shown in Example 1. Its outcome abstraction a.([P]matn) is: 


4([P]natn) = {0, {(F-) ...(FF)},{(7-).-. (TT), (F) -- - (PD)} 


Note that all traces with different result values for the output variable passing 
belong to different sets of traces (i.e., partitions) in ae([P]matn). The empty set 
corresponds to the (unfeasible) non-terminating outcome of the program. a 


We can now use the outcome abstraction a. to define the outcome semantics 
A. € P (P (X*%®)) as an abstraction of the trace semantics A: 


Definition 1. The outcome semantics A, € P (P (X+%®)) is defined as: 


Ay = a(A) (8) 
where ae is the outcome abstraction (cf. Eq. 7) and A € P(t) is the trace 
semantics (cf. Eq. 1). 


The outcome semantics contains the set of all infinite traces and all sets of finite 
traces that agree on the value of the output variables in their outcome. 

In the following, we express the outcome semantics A, in a constructive 
fixpoint form. This allows us to later derive further abstractions of A, by fixpoint 


transfer and fixpoint approximation [12]. Given a set of sets of traces S, we 


write SŁ, {T € S|T =T},} for the selection of the sets of traces in S 


that agree on the value v of the output variable o in their outcome, and S” d 


{T € S | T = T” } for the selection of the sets of infinite traces in S. When S}, 
(resp. S”) contains a single set of traces T, we abuse notation and write SŁ, 
(resp. S”) to also denote T. The following result gives a fixpoint definition of 
A, in the complete lattice (P (P (X+®)) , E, i, M, {X° 0}, {0, XT}, where the 
computational order E is defined (similarly to E, cf. Sect.2) as: 


E def E a w w 
Sı E Sy = VAN Sio v1 Ok=Uk G S25, V1,- Ok=Uk A SY 2 S3 


yesss 


Theorem 1. The outcome semantics A, E€ P (P (X*%)) can be expressed as a 
least fixpoint in (P (P (X+%)) GE, l,m, {5”, 0} ,{0, Dt} as: 


A, = lfp= 0. 
def (9) 
0.(S) = {Poisi Oe =e |v. uk EV} U{r;T|T eS} 


def 


where Sı WU Sp = ie a ee U yy. ei | v1,..., Un E V}USPUSY. 


Figure 4 illustrates the first fixpoint iterates of the outcome semantics for a 
single output variable o. The fixpoint iteration starts from the partition contain- 
ing the set of all infinite sequences of program states and the empty set (which 
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TaT a vevo ff Rh 


Fig. 4. First iterates of the outcome semantics A. for a single output variable o. 


represents an empty set of finite traces). At the first iteration, the empty set is 
replaced with a partition of the final states 2 based on the value v of the output 
variable o, while the infinite sequences are extended by prepending transitions 
to them (similarly to the trace semantics, cf. Eq. 1). At the next iterations, all 
sequences contained in each partition are further extended, and the final states 
that agree on the value v of o are again added to the matching set of traces that 
agree on v in their outcome. At the limit, we obtain a partition containing the 
set of all infinite traces and all sets of finite traces that agree on the value v of 
the output variable o in their outcome. 

To prove Theorem 1 we first need to show that the outcome abstraction a, 
preserves least upper bounds of non-empty sets of sets of traces. 


Lemma 1. The outcome abstraction a. is Scott-continuous. 


Proof. We need to show that for any non-empty ascending chain C of sets of 
traces with least upper bound UC, we have that œs (UC) = tI {a.(T) | T € C}, 
that is, a.(LUIC) is the least upper bound of a,(C), the image of C via ae. 

First, we know that a, is monotonic, i.e., for any two sets of traces T} and 
T> we have Tı E Th > ae(T)) E ae(T2). Since UC is the least upper bound of 
C, for any set T in C we have that T C UC and, since a, is monotonic, we have 
that a.(T) E as (UC). Thus a(LiC) is an upper bound of {a.(T) |T € C}. 

To show that a(UC) is the least upper bound of a,(C), we need to show that 
for any other upper bound U of a.(C) we have a.(UC) E U. Let us assume by 
absurd that as (UC) Æ U. Then, there exists T; € ae (UC) and To € U such that 
Ti Z To: TY D Ty or TY C TY. Let us assume that T} D TY. By definition of 
Qe, we observe that T; is a partition of LIC and, since LIC is the least upper bound 
of C, U cannot be an upper bound of a.(C) (since Tz does not contain enough 
finite traces). Similarly, if TY C Ts’, then U cannot be an upper bound of a,(C) 
(since Tə contains too many infinite traces). Thus, we must have œs (UC) E U 
and we can conclude that a(LIC) is the least upper bound of as (C). 


We can now prove Theorem 1 by Kleenian fixpoint transfer [12]. 


An Abstract Interpretation Framework for Data Usage 693 


Proof (Sketch). The proof follows by Kleenian fixpoint transfer. We have that 
(P(P(2t%)), El, m, {2”, 0}, {0, XT }) is a complete lattice and that o+% (cf. 
Eq. 1) and O, (cf. Eq. 8) are monotonic function. Additionally, we have that the 
outcome abstraction a. (cf. Eq. 7) is Scott-continuous (cf. Lemma 1) and such 
that as (X) = {L”,0} and ago pt = O. © as. Then, by Kleenian fixpoint 
transfer, we have that a(A) = a.(lfp= ¢*~) = lfp* ©.. Thus, we can conclude 
that A, = lfp= O.. 


Finally, we show that the outcome semantics A, is sound and complete for 
proving that a program does not use (a subset of) its input variables. 


Theorem 2. A program does not use a subset J of its input variables if and 
only if its outcome semantics A, is a subset of NJ: 


PHNI& 4A CNJ 


Proof (Sketch). The proof follows immediately from the definition of My (cf. 
Eq. 3 and Sect.4) and the definition of A, (cf. Eq. 8). 


Example 4. Let us consider again the program P and its semantics [P]science 
shown in Example 1. The corresponding outcome semantics a([P] science) is: 


e([P]sctence) = {0,{(1-).-. (TF), (F-)..-(FF)} , {(T-) --- (70), (F)... eT} 


Note that all sets of traces in a.([P] science) belong to Nfscience}: the initial 
states of all traces in a non-empty partition contain all possible initial values (T 
or F) for the input variable science. Thus, P satisfies N{science} and, indeed, 
the input variable science is unused by P. | 


As discussed in Sect.4, we now can again use the standard framework of 
abstract interpretation to soundly over-approximate A, and prove that a pro- 
gram does not use (some of) its input data. In the next section, we propose 
an abstraction that remains sound and complete for input data usage. Further 
sound but not complete abstractions are presented in later sections. 


6 Dependency Semantics 


We observe that, to reason about input data usage, it is not necessary to consider 
all intermediate state computations between the initial state of a trace and its 
outcome. Thus, we can further abstract the outcome semantics A, into a set A, 
of (dependency) relations between initial states and outcomes of a set of traces. 

We lift the abstraction defined for this purpose on sets of traces [12] to 
a: P (P (X+®)) + P (P (X x X1)) on sets of sets of traces: 


a(S) {{(e[0], ow] Ee 2x SL |cET}|TeS} (10) 


where X, “& Su {L}. The dependency abstraction a, ignores all intermediate 
states between the initial state o[0] and the outcome o[w] of all traces ø in 
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all partitions T of S. Observe that a trace ø that consists of a single state 
s is abstracted as a pair (s,s). The corresponding dependency concretization 
function ys: P(P (2x X1)) > P(P(2*°)) over-approximates the original 
sets of traces by inserting arbitrary intermediate states: 


g (8) Æ {T E€ P (Et) | {o0 o] E 2x LL joe T}eS} (11) 


Example 5. Let us consider again the program of Example 1 and its outcome 
semantics &e([P]natn) shown in Example 3. Its dependency abstraction is: 


as (Ae ([P]natn)) = {0, {(F-, FF) } , {{T-; TT), (FFT) }} 
which explicitly ignores intermediate program states. E 
Using a.,, we now define the dependency semantics A~ € P (P (X+%®)) as 
an abstraction of the outcome semantics Ae. 
Definition 2. The dependency semantics AL, € P (P (X*%®)) is defined as: 


Av, E an (Ae) (12) 
where Ae € P(P(X'*%)) is the outcome semantics (cf. Eq. 8) and a., is the 
dependency abstraction (cf. Eq. 10). 


Neither the Kleenian fixpoint transfer nor the Tarskian fixpoint transfer can 
be used to obtain a fixpoint definition for the dependency semantics, but we 
have to proceed by union of disjoint fixpoints [12]. To this end, we observe that 
the outcome semantics A, can be equivalently expressed as follows: 

A. = A} U AY = lfp OF U Ip Fou og 


} 


OES) © {Roo orzo |U. U EVEU{7;T|TES} (13) 


OLS) Æ r; T| Tes} 


where Af and AY separately compute the set of all sets of finite traces that agree 
on their outcome, and the set of all infinite traces, respectively. 

In the following, given a set of traces T € P (X+) and its dependency abstrac- 
tion a~ (T), we abuse notation and write TT (resp. T®) to also denote as (T)* = 
w def 


as (T) NA (E x X) (resp. a.s(T)” = an (T) (2 x {L})). Similarly, we reuse the 
symbols for the computational order E, least upper bound L, and greatest lower 
bound M, instead of their abstractions. We can now use the Kleenian and Tarskian 
fixpoint transfer to separately derive fixpoint definitions of a.,(Af) and an (42) 
in (P (P(X x X1)), E, l,m, {X x {1},0},{0, 2 x D}). 


Lemma 2. The abstraction At,  a.,(At) € P (P(E x X)) can be expressed 
as a least fixpoint in (P (P (X x X1)), E, L, m, {X x {1}, 0}, {0,2 x XY) as: 


At, = lpia} et, 


et (Ss) E {Ro PER ok=uk * 26445. 06 =U Viz- Uk E V} U {r oR | Re St 
(14) 
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Proof (Sketch). By Kleenian fixpoint transfer (cf. Theorem 17 in [12]). 


Lemma 3. The abstraction AY, £ as ( AL) € P(P (2X x X)) can be expressed 


as a least fixpoint in (P (P (X x X1)), E, l,m, {X x {1}, 0}, {0,5 x X} as: 


= = lisu OF 


15 
6”, (S) 2 {roR| RES} oe 


Proof (Sketch). By Tarskian fixpoint transfer (cf. Theorem 18 in [12]). 


The fixpoint iteration for At, starts from the set containing only the empty 
relation. At the first iteration, the empty relation is replaced by all relations 
between pairs of final states that agree on the values of the output variables. 
At each next iteration, all relations are combined with the transition relation 
to obtain relations between initial and final states of increasingly longer traces. 
At the limit, we obtain the set of all relations between the initial and the final 
states of a program that agree on the final value of the output variables. The 
fixpoint iteration for AY, starts from the set containing (the set of) all pairs of 
states and the L outcome, and each iteration discards more and more pairs with 
initial states that do not belong to infinite traces of the program. 

Now we can use Lemmas 2 and 3 to express the dependency semantics AL. 
in a constructive fixpoint form (as the union of At, and A%,). 


Theorem 3. The dependency semantics AL, E P (P(X x 5)) can be expressed 
as a least fixpoint in (P (P (X x X1)), E, 1,0, {X x {1},0},{0, © x XY) as: 


A. = AT, U AS, = lpi. 6139; On 


O(S) 2 {Poues x Deak, oi | U1,+++5Uk = V} iC) {r O R | R E S} 
(16) 


Proof (Sketch). The proof follows immediately from Lemmas 2 and 3. 


Finally, we show that the dependency semantics A., is sound and complete 
for proving that a program does not use (a subset of) its input variables. 


Theorem 4. A program does not use a subset J of its input variables if and 
only if the image via y., of its dependency semantics A, is a subset of Ny: 


P ENI & y7+(A~) CNI 


Proof (Sketch). The proof follows from the definition of Av, (cf. Eq. 12) and 7, 
(cf. Eq. 11), and from Theorem 2. 


Example 6. Let us consider again the program P and its outcome semantics 
Qe([-P]] science) from Example 4. The corresponding dependency semantics is: 


Ans (ae(P] science)) = 10, KS TF), (F, FF)} 5 {Cre TT), (F, FT)}} 
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and, by definition of y~», we have that its concretization y~.(a.+(Qe([P] science))) 
is an over-approximation of a.([P] science). In particular, since intermediate state 
computations are irrelevant for deciding the input data usage property, all sets 
of traces in y..(a.+(e([P] science))) are over-approximations of exactly one set 
in ae([P] science) with the same set of initial states and outcome. Thus, in this 
case, we can observe that all sets of traces in y..(a.+(Qe([P]]science))) belong to 
Nractence} and correctly conclude that P does not use the variable science. W 


At this point we have a sound and complete program semantics that captures 
only the minimal information needed to decide which input variables are unused 
by a program. In the rest of the paper, we present various static analyses for 
input data usage by means of sound abstractions of this semantics, which under- 
approximate (resp. over-approximate) the subset of the input variables that are 
unused (resp. used) by a program. 


7 Input Data Usage Abstractions 


We introduce a simple sequential programming language with boolean variables, 
which we use for illustration throughout the rest of the paper: 


e:=vu|ax|note|eande|eore (expressions) 


s ::= skip | x =e | if e: s else: s | while e: s|ss (statements) 
where v ranges over boolean values, and x ranges over program variables. The 
skip statement, which does nothing, is a placeholder useful, for instance, for 
writing a conditional if statement without an else branch: if e: s else: skip. 
In the following, we often simply write if e: s instead of if e: s else: skip. 
Note that our work is not limited by the choice of a particular programming 
language, as the formal treatment in previous sections is language independent. 

In Sects. 8 and 9, we show that existing static analyses based on dependencies 
[6,20] are abstractions of the dependency semantics A... We define each abstrac- 
tion A? over a partially ordered set (A, E4) called abstract domain. More specifi- 
cally, for each program statement s, we define a transfer function O[s]: A — A, 
and the abstraction A® is the composition of the transfer functions of all state- 
ments in a program. We derive a more precise static analysis similar to depen- 
dency analyses used for program slicing [37] in Sect. 10. Finally, Sect. 11 demon- 
strates the value of expressing such analyses as abstract domains by combining 
them with an existing abstraction of compound data structures such as arrays 
and lists [16] to detect unused chunks of input data. 


8 Secure Information Flow Abstractions 


Secure information flow analysis [18] aims at proving that a program will not leak 
sensitive information. Most analyses focus on proving non-interference [35] by 
classifying program variables into different security levels [17], and ensuring the 
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absence of information flow from variables with higher security level to variables 
with lower security level. The most basic classification comprises a low security 
level L, and a high security level H: program variables classified as L are public 
information, while variables classified as H are private information. 

In our context, if we classify input variables as H and all other variables as L, 
possiblistic non-interference [21] coincides with the input data usage property M 
(cf. Eq. 4) restricted to consider only terminating programs. However, in general, 
(possibilistic) non-interference is too strong for our purposes as it requires that 
none of the input variables is used by a program. We illustrate this using as 
an example a non-interference analysis recently proposed by Assaf et al. [6] 
that is conveniently formalized in the framework of abstract interpretation. We 
briefly present here a version of the originally proposed analysis, simplified to 
consider only the security levels L and H, and we point out the significance of 


the definitions for input data usage. 


Let L {L, H} be the set of security levels, and let the set X of all program 


variables be partitioned into a set Xz of variables classified as L and a set Xp 
of variables classified as H (i.e., the input variables). A dependency constraint 
L ~~ x expresses that the current value of the variable x depends only on the 
initial values of variables having at most security level L (i.e., it does not depend 
on the initial value of any of the input variables). The non-interference analysis 
Ap proposed by Assaf et al. is a forward analysis in the lattice (P (F) , Cr, Ur) 


where F {L ~ «|x EX} is the set of all dependency constraints, Sı Cp 
So = Sı D S2, and S1 Up S2 det S1 N S2. The transfer function Of|s]: P (F) > 
P (F) for each statement s in our simple programming language is defined as 
follows: 


Op[skip](S) = S 
Orle = €](S) SF {L ~ y € S | y # £} U {L ~ 2 | Vole] S} 


da a Ur Op[s2](S) if Ve[e]S 


Orlif e: lse: S 
rlif e: sı else: s2] (5) {L ~ze S| xg w(s1)UW(s2)} otherwise 


Orp[while e: s](S) f lfp" Orp[if e: s else: skip] 
Or[s1 s2](8) = Opfs2] o Orfs1](9) 


where w(s) denotes the set of variables modified by the statement s, and Vple]S 
determines whether a set of dependencies S' guarantees that the expression e has 
a unique value independently of the initial value of the input variables. For a 
variable x, Vp[az]S is true if and only if L ~~ x € S. Otherwise, Vple]S is 
defined recursively on the structure of e, and it is always true for a boolean 
value v [6]. An assignment x = e discards all dependency constraints related 
to the assigned variable x, and adds constraints L ~~ « if e has a unique value 
independently of the initial values of the input variables. This captures an explicit 
flow of information between e and x. A conditional statement if e: sı else: s2 
joins the dependency constraints obtained from sı and s2, if e does not depend 
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on the initial values of the input variables (i.e., Vefe]S is true). Otherwise, it 
discards all dependency constraints related to the variables modified in either of 
its branches. This captures an implicit flow of information from e. The initial 
set of dependencies contains a constraint L ~> x for each variable x that is not 
an input variable. We exemplify the analysis below. 


Example 7. Let us consider again the program P from Example 1 (stripped of 
the input and print statements, which are not present in our simple language): 


1 passing = True 

2 if not english: english = False # english should be passing 
3 if not math: passing = bonus 

1 if not math: passing = bonus # math should be science 


The analysis begins from the set of dependency constraints {L ~ passing}, 
which classifies input variables as H and all other variables as L. The assignment 
at line 1 leaves the set unchanged as the value of the expression True on the 
right-hand side of the assignment does not depend on the initial value of the 
input variables. The set remains unchanged by the conditional statement at line 
2, even though the boolean condition depends on the input variable english, 
because the variable passing is not modified. Finally, at line 3 and 4, the anal- 
ysis captures an explicit flow of information from the input variable bonus and 
an implicit flow of information from the input variable math. Thus, the set of 
dependency constraints becomes empty at line 3, and remains empty at line 4. 

Observe that, in this case, non-interference does not hold since the result of 
the program depends on some of the input variables. Therefore, the analysis is 
only able to conclude that at least one of the input variables may be used by 
the program, but it cannot determine which input variables are unused. E 


The example shows that non-interference is too strong a property in general. 
Of course, one could determine which input variables are unused by running 
multiple instances of the non-interference analysis Ap, each one of them classify- 
ing a single different input variable as H and all other variables as L. However, 
this becomes cumbersome in a data science application where a program reads 
and manipulates a large amount of input data. 

Moreover, we emphasize that our input data usage property is more general 
than (possibilistic) non-interference since it also considers non-termination. We 
are not aware of any work on termination-sensitive possibilistic non-interference. 


Example 8. Let us modify the program P shown in Example 7 as follows: 

1 passing = True 

2 while not english: english = False 

In this case, since the loop at line 2 does not modify the output variable passing, 
the non-interference analysis Ar will leave the initial set of dependency con- 
straints {L ~ passing} unchanged, meaning that the result of the program does 
not depend on any of its input variables. However, the input variable english 
is used since its value influences the outcome of the program: the program ter- 
minates if english is true, and does not terminate otherwise. a 
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The example demonstrates that the analysis is unsound for a non-terminating 
program.” We show that the non-interference analysis Ap is sound for proving 
that a program does not use any of its input variables, only if the program is 
terminating. We define the concretization function yp: P (F) > P (P (X x X)): 


vp (S) {Re P(E x X) | ap(R) Cp S} (17) 


The abstraction function ap: P (P(X x X)) — P (F) maps each relation R 
between states of a program to the corresponding set of dependency constraints: 
ar(R) af {L ~ x |z E€ Xz AVi € Xyp: UNUSED;,.(R)}, where UNUSED; s is the 
relational abstraction of UNUSED; (cf. Eq. 3) in which we compare only the result 
stored in the variable x (i.e., we compare c|w](o) and o’[w](o), instead of ofw] 
and o’[w] as in Eq. 3). 


Theorem 5. A terminating program does not use any of its input variables if 
the image via y~ o yp of its non-interference abstraction Ap is a subset of N: 


yall AF) CN > PHN 


Proof. Let us assume that y~ (yp(Af)) E M. By definition of yp (cf. Eq. 17), 
since the program is terminating, we have that A~ C yr(Ar) and, by mono- 
tonicity of the concretization function y~» (cf. Eq. 11), we have that y.(A-.) C 
y-(qyr(Ar)). Thus, since 7..(yr(Ar)) C M, we have that y..(A..) C M. The 
conclusion follows from Theorem 4. 


Note that the termination of the program is necessary for the proof of The- 
orem 5. Indeed, for a non-terminating program, we have that A~ Z yr(Ar) 
(since AL, includes relational abstractions of infinite traces that are missing 
from yr(Ar)) and thus we cannot conclude the proof. 

This result shows that the non-interference analysis Ap is an abstraction of 
the dependency semantics Av, presented earlier. However, we remark that the 
same result applies to all other instances in this important class of analysis [5, 25, 
etc.], which are therefore subsumed by our framework. 


9 Strongly Live Variable Abstraction 


Strongly live variable analysis [20] is a variant of the classic live variable analysis 
[32] performed by compilers to determine, for each program point, which vari- 
ables may be potentially used before they are assigned to. A variable is strongly 
live if it is used in an assignment to another strongly live variable, or if is used in 
a statement other than an assignment. Otherwise, a variable is considered faint. 


? The case of a program using an input variable and then always diverging is not 
problematic because the analysis would be imprecise but still sound. 
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Strongly live variable analysis Ax is a backward analysis in the complete 
lattice (P (X), C,U, N, Ø, X), where X is the set of all program variables. The 
transfer function Ox[s]: P (X) — P (X) for each statement s is defined as: 


Ox|[skip](S) = S 
= def (S \ {a}) Uvars(e) we S 
Skee = otherwise 
Oxlif b: sı else: s2] (5) © vars(b) U Ox[si](S) U Ox [se] (9) 
Oxwhile b: s] (5) © vars(b) U Ox [s] (5) 


Ox[s1 52](S) = Ox[si] o Ox[s2](5) 


where VARS(e) is the set of variables in the expression e. For input data usage, the 
initial set of strongly live variables contains the output variables of the program. 


Example 9. Let us consider again the program P shown in Example 7. The 
strongly live variable analysis begins from the set {passing} containing the out- 
put variable passing. At line 3, the set of strongly live variables is {math, bonus} 
since bonus is used in an assignment to the strongly live variable passing, and 
math is used in the condition of the if statement. Finally, at line 1, the set of 
strongly live variables is {english, math, bonus} because english is used in the 
condition of the if statement at line 2. Thus, strongly live variable analysis is 
able to conclude that the input variable science is unused. However, it is not 
precise enough to determine that the variable english is also unused. a 


The imprecision of the analysis derives from the fact that it does not capture 
implicit flows of information precisely (cf. Sect.8) but only over-approximates 
their presence. Thus, the analysis is unable to detect when a conditional state- 
ment, for instance, modifies only variables that have no impact on the outcome 
of a program; a situation likely to arise due to a programming error, as shown in 
the previous example. However, in virtue of this imprecise treatment of implicit 
flows, we can show that strongly live variable analysis is sound for input data 
usage, even for non-terminating programs. 

We define the concretization function yx: P(X) — P (P(X x X,))as: 


x(S) Æ {Re X x X1 | Vie X\ S: UNUSED;(R)} (18) 


where we abuse notation and use UNUSED; (cf. Eq. 3) to also denote its depen- 
dency abstraction (cf. Eq. 10). We now show that strongly live variable analysis 
is sound for proving that a program does not use the faint variables. 


Theorem 6. A program does not use a subset J of its input variables if the 
image via y~» o yx of its strongly live variable abstraction Ax is a subset of NJ: 


“us (x (Ax)) ENI > PENI 
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Proof. Let us assume that 7..(yx(Ax)) E Ny. By definition of yx (cf. Eq. 18), 
we have that A~ C yx(Ax) and, by monotonicity of y~» (cf. Eq. 11), we have 
that y.(A..) C y.(yx(Ax)). Thus, since 7..(¥x(Ax)) CG Mz, we have that 
y.+(A..) E Mz. The conclusion follows from Theorem 4. 


This result shows that also strongly live variable analysis is subsumed by our 
framework as it is an abstraction of the dependency semantics A... 


10 Syntactic Dependency Abstractions 


In the following, we derive a more precise data usage analysis based on syntactic 
dependencies between program variables. For simplicity, the analysis does not 
take program termination into account, but we discuss possible solutions at the 
end of the section. Due to space limitations, we only provide a terse description 
of the abstraction and refer to [36] for further details. 


MN, 
Ny, 


Fig. 5. Hasse diagram for the complete lattice (USAGE, Evsacs, Uvsace; Musace, N, U). 


In order to capture implicit dependencies from variables appearing in boolean 
conditions of conditional and while statements, we track when the value of a 
variable is used or modified in a statement based on the level of nesting of the 
statement in other statements. More formally, each program variable maps to a 
value in the complete lattice shown in Fig. 5: the values U (used) and N (not- 
used) respectively denote that a variable may be used and is not used at the 
current nesting level; the values B (below) and W (overwritten) denote that 
a variable may be used at a lower nesting level, and the value W additionally 
indicates that the variable is modified at the current nesting level. 

A variable is used (i.e., maps to U) if it is used in an assignment to another 
variable that is used in the current or a lower nesting level (i.e., a variable that 
maps to U or B). We define the operator ASSIGN[x = e] to compute the effect 
of an assignment on a map m: X — USAGE, where X is the set of all variables: 


W y =x ^y ¢ VARS(e) A m(x) € {U, B} 
ASSIGN|z = e](m) = Ay. 4 U y € VARS(e) A m(x) € {U, B} (19) 
m(y) otherwise 


The assigned variable is overwritten (i.e., maps to W), unless it is used in e. 
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Another reason for a variable to be used is if it appears in the boolean 
condition e of a statement that uses another variable or modifies another used 
variable (i.e., there exists a variable x that maps to U or W): 


FILTER|[e](m) = Ay. 


def f y € vARs(e) Ax € X: mle) € {UW} op) 


m(y) otherwise 


We maintain a stack of these maps that grows or shrinks based on the level 
of nesting of the currently analyzed statement. More formally, a stack is a tuple 
(mo, Mı, ..., Mk} of mutable length k, where each element Mo, M1, ..., Mp is a 
map from X to USAGE. In the following, we use Q to denote the set of all stacks, 
and we abuse notation by writing ASSIGN|« = e] and FILTER|e] to also denote 
the corresponding operators on stacks: 


Q 


ASSIGN[z = e] ( (mo, mi, ...,mx)) & (AsstGn[x = e](mo), mi, ..., mp) 


FILTER[e]((mo,71,...,™x)) & (FILTER[e] (mo), Mmi,- --, Mp) 


a 


The operator PUSH duplicates the map at the top of the stack and modifies 
the copy using the operator INC, to account for an increased nesting level: 
PUSH( (Mo, M1, . . -, Mk) ) = (INC(Mo), Mo, M1,- -, Mk) 
B m(y) € {U} (21) 
INC(m) = Ay. 4 N m(y) € {W} 
m(y) otherwise 


A used variable (i.e., mapping to U) becomes used below (i.e., now maps to B), 
and a modified variable (i.e., mapping to W) becomes unused (i.e., now maps 
to N). The dual operator POP combines the two maps at the top of the stack: 


det 


POP((mo,™1,.--,; Mx)) DEC(™mo,™1),---, Mk) 


def Hee m(y) € {B, N} (22) 


DEC(m,k) = Ay. 
(m, k) . m(y) otherwise 


where the DEC operator restores the value a variable y mapped to before increas- 
ing the nesting level (i.e., k(y)) if it has not changed since (i.e., if the variable 
still maps to B or N), and otherwise retains the new value y maps to. 

We can now define the data usage analysis Ag, which is a backward analysis 
on the lattice (Q, Eq, UQ). The partial order Cg and the least upper bound 
Ug are the pointwise lifting, for each element of the stack, of the partial order 
and least upper bound between maps from X to USAGE (which in turn are the 
pointwise lifting of the partial order Eysacg and least upper bound Upsacp of the 
USAGE lattice, cf. Fig.5). We define the transfer function Og[s]: Q — Q for 
each statement s in our simple programming language as follows: 
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math, bonus +> U, passing ++ W Ua passing +> U = math, bonus, passing +> U 
if not math: 

bonus +> U, passing > W | passing > U 

passing = bonus 

passing > B | passing > U 
passing — U 


Fig. 6. Data usage analysis of the last statement of the program shown in Example 7. 
Stack elements are separated by | and, for brevity, variables mapping to N are omitted. 


OQfskip] (4) 
Ogle = e] (q) 
Og|if b: sı else: sa] (q) Tf POP o FILTER[b] © OgQ[s1] © PUSH(q) 


q 


ASSIGN[x = e] (q) 


Lig POP o FILTER[|b] o Og|[s2] o PUSH(q) 


Og|while b: s](q) lfp? OgQlif b: s else: skip] 


Oqlsı s2} (4) = OQlsı] © Pals2] (4) 


The initial stack contains a single map, in which the output variables map to 
the value U, and all other variables map to N. We exemplify the analysis below. 


Example 10. Let us consider again the program P shown in Example 7. The 
initial stack begins with a single map m, in which the output variable passing 
maps to U and all other variables map to N. 

At line 4, before analyzing the body of the conditional statement, a modified 
copy of m is pushed onto the stack: this copy maps passing to B, meaning that 
passing is only used in a lower nesting level, and all other variables still map to 
N (cf. Eq. 21). As a result of the assignment (cf. Eq. 19), passing is overwritten 
(i.e., maps to W), and bonus is used (i.e., maps to U). Since the body of the 
conditional statement modifies a used variable and uses another variable, the 
analysis of its boolean condition makes math used as well (cf. Eq. 20). Finally, 
the maps at the top of the stack are merged and the result maps math, bonus, 
and passing to U, and all other variables to N (cf. Eq. 22). The analysis is 
visualized in Fig. 6. 

The stack remains unchanged at line 3 and line 2, since the statement at line 
3 is identical to line 4 and the body of the conditional statement at line 2 does 
not modify any used variable and does not use any other variable. Finally, at 
line 1 the variable passing is modified (i.e., it now maps to W), while math and 
bonus remain used (i.e., they map to U). Thus, the analysis is precise enough 
to conclude that the input variables english and science are unused. a 


Note that, similarly to the non-interference analysis presented in Sect. 8, the 
data usage analysis Ag does not consider non-termination. Indeed, for the pro- 
gram shown in Example 8, the analysis does not capture that the input variable 
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english is used, even though the termination of the program depends on its 
value. We define the concretization function yo: Q > P (P (X x X)) as: 


yq((mo,...,mx)) Š {RE Ox V| Vi € X: mo(i) € {N} > uNusED:(R)} (23) 


where again we write UNUSED, (cf. Eq. 3) to also denote its dependency abstrac- 
tion. We now show that Aq is sound for proving that a program does not use a 
subset of its input variables, if the program is terminating. 


Theorem 7. A terminating program does not use a subset J of its input vari- 
ables if the image via y~ © YQ of its abstraction Ag is a subset of NJ: 


q-a) CN > PEN 


Proof. Let us assume that y~ (yo(4q)) E N3. Since the program is terminating, 
we have that Av. C yq(Aq), by definition of the concretization function yq (cf. 
Eq. 23). Then, by monotonicity of y» (cf. Eq. 11), we have that ys (A~) C 
y+(¥Q(Aq)). Thus, since y» (yo(4gQ)) E Nz, we have that y.(A..) C Mz. The 
conclusion follows from Theorem 4. 


In order to take termination into account, one could map each variable 
appearing in the guard of a loop to the value U. Alternatively, one could run 
a termination analysis [3,33,34], along with the data usage analysis, and only 
map to U variables appearing in the loop guard of a possibly non-terminating 
loop. 


11 Piecewise Abstractions 


The static analyses presented so far can be used only to detect unused data 
stored in program variables. However, realistic data science applications read 
and manipulate data organized in data structures such as arrays, lists, and dic- 
tionaries. In the following, we demonstrate that having expressed the analyses 
as abstract domains allows us to easily lift the analyses to such a scenario. In 
particular, to detect unused chunks of the input data, we combine the more pre- 
cise data usage analysis presented in the previous section with the array content 
abstraction proposed by Cousot et al. [16]. Due to space limitations, we provide 
only an informal description of the resulting abstract domain and refer to [36] 
for further details and examples. The analyses presented in earlier sections can 
be similarly combined with the array abstraction for the same purpose. 

We extend our small programming language introduced in Sect. 7 with integer 
variables, arithmetic and boolean comparison expressions, and arrays: 


eu=---|ale]|len(a)|e @ ele xe (expressions) 
su=---|ale] =e (statements) 


where © and & respectively range over arithmetic and boolean comparison oper- 
ators, a ranges over array variables, and len(a) denotes the length of a. 
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Piecewise Array Abstraction. The array abstraction [16] divides an array into 
consecutive segments, each segment being a uniform abstraction of the array 
content in that segment. The bounds of the segments are specified by sets of 
side-effect free expressions restricted to a canonical normal form, all having the 
same (concrete) value. The abstraction is parametric in the choice of the abstract 
domains used to manipulate sets of expressions and to represent the array con- 
tent within each segment. For our analysis, we use the octagon abstract domain 
[31] for the expressions, and the USAGE lattice presented in the previous section 
(cf. Fig.5) for the segments. Thus, an array a is abstracted, for instance, as 
{0,1} N{j +1}? U {len(a)}, where the symbol ? indicates that the segment 
{0,i} N{j +1} might be empty. The abstraction indicates that all array ele- 
ments (if any) from index i (which is equal to zero) to index j (the bound j + 1 
is exclusive) are unused, and all elements from j + 1 to len(a) — 1 may be used. 
Let A be the set of all such array abstractions. The initial segmentation of an 
array a € A is a single segment with unused content (i.e., {0} N {len(a)}?). 

For our analysis, we augment the array abstraction with new back- 
ward assignment and filter operators. The operators ASSIGN4 ļaļi] = e] and 
FILTER |e] split and fill segments to take into account assignments and accesses 
to array elements that influence the program outcome. For instance, an assign- 
ment to ali] with an expression containing a used variable modifies the segmen- 
tation {0} N {len(a)}? into {0} N {i}? U {i +1} N {len(a)}?, which indicates 
that the array element at index i is used by the program. An access ali] in a 
boolean condition guarding a statement that uses or modifies another used vari- 
ables is handled analogously. Instead, the operator ASSIGN, | = e] modifies the 
segmentation of an array by replacing each occurrence of the assigned variable 
x with the canonical normal form of the expression e. For instance, an assign- 
ment i = i + 1 modifies the segmentation {0} N {i}? U {i +1} N {len(a)}? into 
{O} N {4+ 1}? U {i+ 2} N {len(a)}?. If e cannot be precisely put into a canon- 
ical normal form, the operator replaces the assigned variable with an approxi- 
mation of e as an integer interval [13] computed using the underlying numerical 
domain, and possibly merges segments together as a result of the approximation. 
For instance, a non-linear assignment į = i* j approximated as i = [0, 1] modifies 
the segmentation {0} N {i}? U {i+ 1} N {len(a)}? into {0} U {2} N {len(a)}?, 
which loses the information that the initial segment of the array is unused. 

When merging control flows, segmentations are compared or joined by means 
of a unification algorithm [16], which finds the coarsest common refinement 
of both segmentations. Then, the comparison Ea or the join Ua is performed 
pointwise for each segment using the corresponding operators of the underlying 
abstract domain chosen to abstract the array content. For our analysis, we adapt 
and refine the originally proposed unification algorithm to take into account the 
knowledge of the numerical domain chosen to abstract the segment bounds. We 
refer to [36] for further details. A widening V limits the number of segments to 
enforce termination of the analysis. 


Piecewise Data Usage Analysis. We can now map each scalar variable to an 
element of the USAGE lattice and each array variable to an array segmentation 
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failed = 0 
i = 1 
3 while i < len(grades): 
if grades[i] < 4: failed = failed + 1 
i =i +1 
passing = 2 x failed < len(grades) 


Fig. 7. Another program to check if a student has passed a number of exams based on 
their grades stored in the array grades. The programmer has made a mistake at line 
2 that causes the program to ignore the grade stored at index 0 in grades. 


Up IN 1 ly! Uylen( grades 


while i < len(grades): l 
if grades|i] <4: 


N 


grades {0} N {3 l B4i +2}? Bilen(grades 
failed = fail 


ed+ 1 
ades {OLN {i+1}? Bli+2}? BY 


Fig. 8. Data usage analysis of the loop statement of the program shown in Example 11. 
Stack elements are separated by | and, for brevity, only array variables are shown. 


in A, and use the data usage analysis Ag presented in the previous section to 
identify unused input data stored in variables and portions of arrays. 


Example 11. Let us consider the program shown in Fig. 7 where the array vari- 
able grades and the variable passing are the input and output variables, respec- 
tively. The initial stack contains a single map in which passing maps to U, all 
other scalar variables map to N, and grades maps to {0} N {len(grades)}?, 
indicating that all elements of the array (if any) are unused. 

At line 6, the assignment modifies the variable passing (i.e., passing now 
maps to W) and uses the variable failed (i.e., failed now maps to U), while 
every other variable remains unchanged. 

The result of the analysis of the loop statement at line 3 is shown 
in Fig.8. The analysis of the loop begins by pushing (cf. Eq.21) a map 
onto the stack in which passing becomes unused (i.e., maps to N) and 
failed is used only in a lower nesting level (i.c., maps to B), and every 
other variable still remains unchanged. At the first iteration of the anal- 
ysis of the loop body, the assignment at line 4 uses failed and thus 
the access grades[i] at line 3 creates a used segment in the segmentation 
for grades, which becomes {0} N {i}? U{i+1} N {len(grades)}?. At the 
second iteration, the PUSH operator turns the used segment {i}U {i+ 1} 
into {i}B{i+1}, and the assignment to i modifies the segment into 
{i+1} B{i+2} (while the segmentation in the second stack element becomes 
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{O} N {i+ 1}? U{i+2} N {len(grades)}?). Then, the access to the array at 
line 3 creates again a used segment {i} U {i+ 1} (in the first segmentation) 
and the analysis continues with the result of the POP operator (cf. Eq. 22): 
{O} N {ai}? U{i+1}? U{i+2}? N {len(grades)}?. After widening, the last 
two segments are merged into a single segment, and the analysis of the loop 
terminates with {0} N {i}? U {i + 1}? U {len(grades)}?. 

Finally, the analysis of the assignment at line 2 produces the segmentation 
{O} N {1}? U {2}? U {len(grades)}?, which correctly indicates that the first 
element of the array grades (if any) is unused by the program. a 


Implementation. The analyses presented in this and in the previous section are 
implemented in the prototype static analyzer LYRA and are available online’. 

The implementation is in PYTHON and, at the time of writing, accepts pro- 
grams written in a limited subset of PYTHON without user-defined classes. A 
type inference is run before the analysis of a program. The analysis is performed 
backwards on the control flow graph of the program with a standard worklist 
algorithm [32], using widening at loop heads to enforce termination. 


12 Related Work 


The most directly relevant work has been discussed throughout the paper. The 
non-interference analysis proposed by Assaf et al. [6] (cf. Sect. 8) is similar to the 
logic of Amtoft and Banerjee [5] and the type system of Hunt and Sands [25]. 
The data usage analysis proposed in Sect. 10 is similar to dependency analyses 
used for program slicing [37] (e.g., [24]). Both analyses as well as strongly live 
variable analysis (cf. Sect.9) are based on the syntactic presence of a variable 
in the definition of another variable. To overcome this limitation, one should 
look further for semantic dependencies between values of program variables. In 
this direction, Giacobazzi, Mastroeni, and others [19,22,29] have proposed the 
notion of abstract dependency. However, note that an analysis based on abstract 
dependencies would over-approximate the subset of the input variables that are 
unused by a program. Indeed, the absence of an abstract dependency between 
variables (e.g., a dependency between the parity of the variables [19,29]) does 
not imply the absence of a (concrete) dependency between the variables (i.e., a 
dependency between the values of the variables). Thus, such an analysis could 
not be used to prove that a program does not use a subset of its input variables, 
but would be used to prove that a program uses a subset of its input variables. 

Semantics formulations using sets of sets of traces have already been pro- 
posed in the literature [6,28]. Mastroeni and Pasqua [28] lift the hierarchy of 
semantics developed by Cousot [12] to sets of sets of traces to obtain a hierarchy 
of semantics suitable for verifying general program properties (i.e., properties 
that are not subset-closed, cf. Sect. 7). However, none of the semantics that they 
proposed is suitable for input data usage: all semantics in the hierarchy are 
abstractions of a semantics that contains sets with both finite and infinite traces 


3 http://www-.pm.inf.ethz.ch/research/lyra.html. 


708 C. Urban and P. Miiller 


and thus, unlike our outcome semantics (cf. Sect. 5), cannot be used to reason 
about terminating and non-terminating outcomes of a program. Similarly, as 
observed in [28], the semantics proposed by Assaf et al. [6] can be used to verify 
only subset-closed properties. Thus, it cannot be used for input data usage. 

Finally, to the best of our knowledge, our work is the first to aim at detecting 
programming errors in data science code using static analysis. Closely related 
are [7,10] which, however, focus on spreadsheet applications and target errors 
in the data rather than the code that analyzes it. Recent work [2] proposes an 
approach to repair bias in data science code. We believe that our work can be 
applied in this context to prove absence of bias, e.g., by showing that a program 
does not use gender information to decide whether to hire a person. 


13 Conclusion and Future Work 


In this paper, we have proposed an abstract interpretation framework to auto- 
matically detect input data that remains unused by a program. Additionally, we 
have shown that existing static analyses based on dependencies are subsumed 
by our unifying framework and can be used, with varying degrees of precision, 
for proving that a program does not use some of its input data. Finally, we have 
proposed a data usage analysis for more realistic data science applications that 
store input data in compound data structures such as arrays or lists. 

As part of our future work, we plan to use our framework to guide the design 
of new, more precise static analyses for data usage. We also want to explore the 
complementary direction of proving that a program uses its input data by devel- 
oping an analysis based on abstract dependencies [19,22,29] between program 
variables, as discussed above. Additionally, we plan to investigate other appli- 
cations of our work such as provenance or lineage analysis [9] as well as proving 
absence of algorithmic bias [2]. Finally, we want to study other programming 
errors related to data usage such as accidental data duplication. 
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Abstract. There are two kinds of higher-order extensions of model 
checking: HORS model checking and HFL model checking. Whilst the 
former has been applied to automated verification of higher-order func- 
tional programs, applications of the latter have not been well studied. In 
the present paper, we show that various verification problems for func- 
tional programs, including may/must-reachability, trace properties, and 
linear-time temporal properties (and their negations), can be naturally 
reduced to (extended) HFL model checking. The reductions yield a sound 
and complete logical characterization of those program properties. Com- 
pared with the previous approaches based on HORS model checking, our 
approach provides a more uniform, streamlined method for higher-order 
program verification. 


1 Introduction 


There are two kinds of higher-order extensions of model checking in the liter- 
ature: HORS model checking [16,32] and HFL model checking [42]. The for- 
mer is concerned about whether the tree generated by a given higher-order tree 
grammar called a higher-order recursion scheme (HORS) satisfies the property 
expressed by a given modal p-calculus formula (or a tree automaton), and the 
latter is concerned about whether a given finite state system satisfies the prop- 
erty expressed by a given formula of higher-order modal fixpoint logic (HFL), 
a higher-order extension of the modal p-calculus. Whilst HORS model check- 
ing has been applied to automated verification of higher-order functional pro- 
grams [17,18,22,26,33,41,43], there have been few studies on applications of 
HFL model checking to program/system verification. Despite that HFL has been 
introduced more than 10 years ago, we are only aware of applications to assume- 
guarantee reasoning [42] and process equivalence checking [28]. 

In the present paper, we show that various verification problems for higher- 
order functional programs can actually be reduced to (extended) HFL model 
checking in a rather natural manner. We briefly explain the idea of our reduction 
below.! We translate a program to an HFL formula that says “the program has 
a valid behavior” (where the validity of a behavior depends on each verification 


1 In this section, we use only a fragment of HFL that can be expressed in the modal 
p-calculus. Some familiarity with the modal p-calculus [25] would help. 
© The Author(s) 2018 
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problem). Thus, a program is actually mapped to a property, and a program 
property is mapped to a system to be verified; this has been partially inspired by 
the recent work of Kobayashi et al. [19], where HORS model checking problems 
have been translated to HFL model checking problems by switching the roles of 
models and properties. 

For example, consider a simple program fragment read(x);close(x) that 
reads and then closes a file (pointer) x. The transition system in Fig. 1 shows 
a valid access protocol to read-only files. Then, the property that a read oper- 
ation is allowed in the current state can be expressed by a formula of the form 
(read), which says that the current state has a read-transition, after which 
y is satisfied. Thus, the program read(a);close(a) being valid is expressed 
as (read)(close)true,” which is indeed satisfied by the initial state qo of the 
transition system in Fig. 1. Here, we have just replaced the operations read 
and close of the program with the corresponding modal operators (read) and 
(close). We can also naturally deal with branches and recursions. For example, 
consider the program close(x)L(read(x); close(x)), where e;He2 represents a 
non-deterministic choice between e; and e2. Then the property that the pro- 
gram always accesses x in a valid manner can be expressed by ((close)true) ^ 
((read)(close)true). Note that we have just replaced the non-deterministic 
branch with the logical conjunction, as we wish here to require that the program’s 
behavior is valid in both branches. We can also deal with conditional branches if 
HFL is extended with predicates; if b then close(x) else (read(x); close(z)) 
can be translated to (b => (close)true) A (~b = (read)(close)true). Let us 
also consider the recursive function f defined by: 


fx = close(x)O(read(x); read(x); fx), 


Then, the program f being valid can be represented by using a (greatest) 
fixpoint formula: 


vF.(({close)true) ^ ((read) (read) F). 


If the state qo satisfies this formula (which is indeed the case), then we know that 
all the file accesses made by f x are valid. So far, we have used only the modal 
p-calculus formulas. If we wish to express the validity of higher-order programs, 
we need HFL formulas; such examples are given later. 


read 


Fig. 1. File access protocol 


? Here, for the sake of simplicity, we assume that we are interested in the usage of the 
single file pointer x, so that the name x can be ignored in HFL formulas; usage of 
multiple files can be tracked by using the technique of [17]. 
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We generalize the above idea and formalize reductions from various classes 
of verification problems for simply-typed higher-order functional programs with 
recursion, integers and non-determinism — including verification of may /must- 
reachability, trace properties, and linear-time temporal properties (and their 
negations) — to (extended) HFL model checking where HFL is extended with 
integer predicates, and prove soundness and completeness of the reductions. 
Extended HFL model checking problems obtained by the reductions are (neces- 
sarily) undecidable in general, but for finite-data programs (i.e., programs that 
consist of only functions and data from finite data domains such as Booleans), the 
reductions yield pure HFL model checking problems, which are decidable [42]. 

Our reductions provide sound and complete logical characterizations of a 
wide range of program properties mentioned above. Nice properties of the logi- 
cal characterizations include: (i) (like verification conditions for Hoare triples,) 
once the logical characterization is obtained as an HFL formula, purely logical 
reasoning can be used to prove or disprove it (without further referring to the 
program semantics); for that purpose, one may use theorem provers with various 
degrees of automation, ranging from interactive ones like Coq, semi-automated 
ones requiring some annotations, to fully automated ones (though the latter two 
are yet to be implemented), (ii) (unlike the standard verification condition gen- 
eration for Hoare triples using invariant annotations) the logical characterization 
can automatically be computed, without any annotations,’ (iii) standard logical 
reasoning can be applied based on the semantics of formulas; for example, co- 
induction and induction can be used for proving v- and p-formulas respectively, 
and (iv) thanks to the completeness, the set of program properties character- 
izable by HFL formula is closed under negations; for example, from a formula 
characterizing may-reachability, one can obtain a formula characterizing non- 
reachability by just taking the De Morgan dual. 

Compared with previous approaches based on HORS model checking [18, 
22,26,33,37], our approach based on (extended) HFL model checking provides 
more uniform, streamlined methods for higher-order program verification. HORS 
model checking provides sound and complete verification methods for finite-data 
programs [17,18], but for infinite-data programs, other techniques such as pred- 
icate abstraction [22] and program transformation [27,31] had to be combined 
to obtain sound (but incomplete) reductions to HORS model checking. Fur- 
thermore, the techniques were different for each of program properties, such as 
reachability [22], termination [27], non-termination [26], fair termination [31], 
and fair non-termination [43]. In contrast, our reductions are sound and com- 
plete even for infinite-data programs. Although the obtained HFL model check- 
ing problems are undecidable in general, the reductions allow us to treat various 
program properties uniformly; all the verifications are boiled down to the issue 
of how to prove u- and v-formulas (and as remarked above, we can use induction 
and co-induction to deal with them). Technically, our reduction to HFL model 


3 This does not mean that invariant discovery is unnecessary; invariant discovery is 
just postponed to the later phase of discharging verification conditions, so that it 
can be uniformly performed among various verification problems. 
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checking may actually be considered an extension of HORS model checking in 
the following sense. HORS model checking algorithms [21,32] usually consist of 
two phases, one for computing a kind of higher-order “procedure summaries” 
in the form of variable profiles [32] or intersection types [21], and the other for 
nested least/greatest fixpoint computations. Our reduction from program ver- 
ification to extended HFL model checking (the reduction given in Sect.7, in 
particular) can be regarded as an extension of the first phase to deal with infi- 
nite data domains, where the problem for the second phase is expressed in the 
form of extended HFL model checking: see [23] for more details. 

The rest of this paper is structured as follows. Section 2 introduces HFL 
extended with integer predicates and defines the HFL model checking problem. 
Section 3 informally demonstrates some examples of reductions from program 
verification problems to HFL model checking. Section 4 introduces a functional 
language used to formally discuss the reductions in later sections. Sections 5, 6, 
and 7 consider may /must-reachability, trace properties, and temporal properties 
respectively, and present (sound and complete) reductions from verification of 
those properties to HFL model checking. Section 8 discusses related work, and 
Sect.9 concludes the paper. Proofs are found in an extended version [23]. 


2 (Extended) HFL 


In this section, we introduce an extension of higher-order modal fixpoint logic 
(HFL) [42] with integer predicates (which we call HFLz; we often drop the 
subscript and write HFL, as in Sect.1), and define the HFLz model checking 
problem. The set of integers can actually be replaced by another infinite set X 
of data (like the set of natural numbers or the set of finite trees) to yield HFLx. 


2.1 Syntax 


For a map f, we write dom(f) and codom(f) for the domain and codomain 
of f respectively. We write Z for the set of integers, ranged over by the meta- 
variable n below. We assume a set Pred of primitive predicates on integers, 
ranged over by p. We write arity(p) for the arity of p. We assume that Pred 
contains standard integer predicates such as = and <, and also assume that, for 
each predicate p € Pred, there also exists a predicate ~p € Pred such that, 


for any integers n1,...,Nx, p(m1,.-., x) holds if and only if ap(n1,...,n,) does 
not hold; thus, =p(n,,...,m%) should be parsed as (—p)(n,,...,n%), but can 
semantically be interpreted as —=(p(m1,...,x)). 


The syntax of HF Lz formulas is given by: 


p (formulas) :: = n | p1 op p2 | true | false | p(yi,---, Pr) | P1 V p2 | p1 A G2 
|X | (ay | [aly | wx. | vX7.p | AX : o.p | p1 p2 
T (types): =e|o—>T o (extended types) :: = 7 | int 


Here, op ranges over a set of binary operations on integers, such as +, and 
X ranges over a denumerable set of variables. We have extended the origi- 
nal HFL [42] with integer expressions (n and yı op y2), and atomic formulas 
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p(Y~1,---; Pk) on integers (here, the arguments of integer operations or predicates 
will be restricted to integer expressions by the type system introduced below). 
Following [19], we have omitted negations, as any formula can be transformed 
to an equivalent negation-free formula [30]. 

We explain the meaning of each formula informally; the formal semantics 
is given in Sect.2.2. Like modal p-calculus [10,25], each formula expresses a 
property of a labeled transition system. The first line of the syntax of formu- 
las consists of the standard constructs of predicate logics. On the second line, 
as in the standard modal p-calculus, (a)y means that there exists an a-labeled 
transition to a state that satisfies y. The formula [a]y means that after any a- 
labeled transition, ọ is satisfied. The formulas X7.y and vX”. represent the 
least and greatest fixpoints respectively (the least and greatest X that X = y) 
respectively; unlike the modal p-calculus, X may range over not only propo- 
sitional variables but also higher-order predicate variables (of type 7). The à- 
abstractions AX :0.y and applications %1 p2 are used to manipulate higher-order 
predicates. We often omit type annotations in uX” .p, vX7.y and AX :a.y, and 
just write wX.y, vX.p and AX.¢y. 


Example 1. Consider yap p where Yab = pX°*.AY : 6. Y V (a)(X((b)Y)). We 
can expand the formula as follows: 


Pa p = (AY. © Y V (a)(Yav((b)Y)))¢ = p V (a) (Yav((b)¥)) 
= pv (a) ( (b) V (a) (Pa ( (b) (b) p))) =, 


and obtain y V ((a)(b)y) v ((a) (a) (b) (b)y) V ---. Thus, the formula means that 
there is a transition sequence of the form a”b” for some n > 0 that leads to a 
state satisfying y. 


Following [19], we exclude out unmeaningful formulas such as ((a)true)+1 by 
using a simple type system. The types è, int, and g — 7 describe propositions, 
integers, and (monotonic) functions from o to 7, respectively. Note that the 
integer type int may occur only in an argument position; this restriction is 
required to ensure that least and greatest fixpoints are well-defined. The typing 
rules for formulas are given in Fig. 2. In the figure, A denotes a type environment, 
which is a finite map from variables to (extended) types. Below we consider only 
well-typed formulas. 


2.2 Semantics and HFLz Model Checking 


We now define the formal semantics of HFLz formulas. A labeled transition 
system (LTS) is a quadruple L = (U, A, —*, Sinit), where U is a finite set of 
states, A is a finite set of actions, —> C Ux AxU isa labeled transition relation, 
and Sinit € U is the initial state. We write sı > sy when (s;,a,s2) E€ —. 

For an LTS L = (U,A,—, Sinit) and an extended type o, we define the 
partially ordered set (Dr, o, Ero) inductively by: 


Die = 2U CL, =C Dime = Z Enine={ (n,n) |e Z} 


Dioor = {f E€ Dio > Diz | Va, y(x Ero y > fx Err fy)} 
EL eor= che 9) | Vr € Dro- f(x) Cyr Qu } 
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(HT-Inr) Fn yi :e for each i € {1,2} 


Akan: int 
; A Fy vid yo:e 
A Fi yi: int for each i € {1,2} (HT-AND) 
A Fi vi op p2 : int Atuy:e 
(HT-Op) Deas lage (HT-SoME) 
E ee (HT-TRUE) aes 
ata ees Reale cae (HT-ALL) 
—— (HT-FALSE) A Fa [ajy : e 
A Fy false : è ie oa, 
A,X :tTkuy:T (HT-Mv) 
ra eee en ee -MU 
arity(p) = k Aly uX”. pit 
A Fa yi : int for each i € {1,...,k} A,X:thyy:t — 
Ee a -NU 
A Fa ppi,- pk): 0 Atyvx™.p:T ( ) 
(HT-PRED) A,X :otuyp:T (HT-Aps) 
a E å o o D -ABS 
A,X:atyX:o oe) Ary AX :0.p:0—>7T 
A Fr yi : @ for each i € {1,2} Ahkrypyı:o—>rTr Atuyo:a 
(HT-OR) A - 
A Fua y1 V p2: 0 Fa p1 P2: T 
(HT-APP) 


Fig. 2. Typing rules for HFLz formulas 


Note that (D,,,,C1,,) forms a complete lattice (but (Dr int, Grint) does not). 
We write Li,- and T,,, for the least and greatest elements of D,,, (which are 
Az.) and AZ.U) respectively. We sometimes omit the subscript L below. Let [A], 
be the set of functions (called valuations) that maps X to an element of Dy, 


for each X : ø € A. For an HFL formula ọ such that A Fy y 


A Fy y : g, as follows. 


A 
A Fi p(yi,.--59%) : e]i(p) = 


n if ([A Fa gi: int]i(p),..., [A Fu ve : int].(p)) € [p] 
Ø otherwise 


A,X :0 Fy X : olLlp) = p(X) 

Fu p1 V p2 : eft (p) = [A Fu Y1 : el1(p) U [A Fu p2 : elt (p) 
Fu p1 A p2 : eilo) = [A Fu y1 : elL(p) N [A Fu p2 : elt (p) 
Ku (a): e]u(p) = {s | ds’ € [A Fr gy: elL(p). s = s'} 


a HX”. : T]L(p) = lfp, -([A Fr AX: 7. gp: T > TI]L(p)) 
a UX" .p: T]L(p) = stp, -([A Fe AX: 7. yp: 7 > TI] L(p)) 


BRBBBBBBE 


Fa p1 2: Thilo) = [A Fi gi : o > Thl) (TA Fu p2 : oft (p)) 


A 
Z 


q [aly : eji(p) = {s | Ys’ € U. (s — s’ implies s’ € [A Hy ọ: 


: o, we define 
A Fy ọ : oj: as a map from [A], to Do, by induction on the derivation* of 


Alyn: intj) =n [A Fy true : ef: (p) =U [A Fy false : e]:(p) = 9 
Fu Y1 op p2 : int]L() = ([A Fu y1 : int]. (e)) lop] ([A Fu p2 : 


int].(p)) 


e]u(e))} 


by AX :0. p:0 > T]L(p) = {(v, JA, X :o Fa ẹ : Tlo X => v])) | v € Dro} 


ote that the derivation of each judgment A Fy vy: ø is unique if there is any. 
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Here, [op] denotes the binary function on integers represented by op and [p] 
denotes the k-ary relation on integers represented by p. The least/greatest fix- 
point operators lfp,, and gfp,, are defined by lfp, ,(f) = [l] {£ € Di | 
f(x) Ci, x} and gfp,,(f) = Lh. {£ € Pi, | £ Er, f(x)}- Here, ||, , and 
[lı respectively denote the least upper bound and the greatest lower bound 
with respect to C,,,-. We often omit the subscript L and write [A ky vy: o] for 
[A ty Y : ol]. For a closed formula, i.e., a formula well-typed under the empty 
type environment f, we often write Jy], or just [y] for [Ø Fy y : oi (9). 


Example 2. For the LTS Laie in Fig. 1, we have: 


[vX°.((close)true A^ (read) X)] = 
gfp, (Av E€ Di..[X : 0 F (close)true ^ (read)X : e]({X +> x})) = {qo}. 


In fact, x = {qo} € Dy, satisfies the equation: [X : e (close)true A (read)X : 
e]i({X = x}) = z, and x = {qo} € Dy, is the greatest such element. 
Consider the following LTS L1: 


OO 
OROCO) 
and Yap ((c)true) where yap is the one introduced in Example 1. Then, 


[pa ((c)true) i, = {40,92}. 


Definition 1 (HFLz model checking). For a closed formula p of type e, 
we write L,s = vy ifs € [yl], and write LE ¢ if Sinit € [y]. HFLz model 
checking is the problem of, given L and y, deciding whether L = ẹ holds. 


The HFLz model checking problem is undecidable, due to the presence of 
integers; in fact, the semantic domain D,,, is not finite for ø that contains int. 
The undecidability is obtained as a corollary of the soundness and completeness 
of the reduction from the may-reachability problem to HFL model checking 
discussed in Sect. 5. For the fragment of pure HFL (i.e., HFLz without integers, 
which we write HFLg below), the model checking problem is decidable [42]. 

? 


The order of an HFLz model checking problem L = vy is the highest 
order of types of subformulas of y, where the order of a type is defined by: 
order(e) = order(int) = 0 and order(a — T) = max(order(c) + 1, order(r)). 
The complexity of order-k HFLg model checking is k-EXPTIME complete [1], 
but polynomial time in the size of HFL formulas under the assumption that the 
other parameters (the size of LTS and the largest size of types used in formulas) 
are fixed [19]. 


Remark 1. Though we do not have quantifiers on integers as primitives, we can 
encode them using fixpoint operators. Given a formula y : int — è, we can 
express Jr : int.y(x) and Vx: int.y(ax) by (uX4*7*. Aw: int.p(x) V X(xa@—-1) Vv 
X(x+1))0 and (vX**~*.Ax: int.y(x) A X(x — 1) A X(x + 1))0 respectively. 
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2.3 HES 


As in [19], we often write an HFLz formula as a sequence of fixpoint equations, 
called a hierarchical equation system (HES). 


Definition 2. An (extended) hierarchical equation system (HES) is a pair 
(E,p) where E is a sequence of fixpoint equations, of the form: X? =a, 
pis + XI =a, Yn, where a; E€ {u,v}. We assume that X1:71,...,Xn : Tn Fu 
Qi : Ti holds for each i € {1,...,n}, and that ~1,.--,Qn,¢ do not contain any 
fixpoint operators. 


The HES @ = (€,y) represents the HFLz formula toHFL(E, 4) 
defined inductively by: toHFL(e,y) = y and toHFL(E;X™ =a ',y) = 
toHFL(|aX7 .p'/X]E, |aX7.y'/X]p). Conversely, every HFLz formula can be 
easily converted to an equivalent HES. In the rest of the paper, we often rep- 
resent an HFLz formula in the form of HES, and just call it an HFLz for- 
mula. We write [Ð] for [toHFL(@)]. An HES (XF =a, 91; 3X7" =an PnP) 
can be normalized to (XP =, p; X =a, 913-7: XT =a, Yn, Xo) where 
T is the type of y. Thus, we sometimes call just a sequence of equations 
XP =v Yi XT =a, ¥13°°° X =a, Yn an HES, with the understand- 
ing that “the main formula” is the first variable Xo. Also, we often write 
X7 £1 +++ Le =a Y for the equation X7 =a Azı.: Azk.p. We often omit type 
annotations and just write X =, y for X7 =, y. 


Example 3. The formula v.X .Y.(b) XV (a)Y (which means that the current state 
has a transition sequence of the form (a*b)”) is expressed as the following HES: 


(X =, Y; Y =, (b)X V (aY), X). 


3 Warming Up 


To help readers get more familiar with HFLz and the idea of reductions, we give 
here some variations of the examples of verification of file-accessing programs 
in Sect. 1, which are instances of the “resource usage verification problem” [15]. 
General reductions will be discussed in Sects. 5, 6 and 7, after the target language 
is set up in Sect. 4. 

Consider the following OCaml-like program, which uses exceptions. 


let readex x = read x; (if * then () else raise Eof) in 
let rec f x = readex x; f x in 
let d = open_in "foo" in try f d with Eof -> close d 


Here, * represents a non-deterministic boolean value. The function readex reads 
the file pointer x, and then non-deterministically raises an end-of-file (Eof) excep- 
tion. The main expression (on the third line) first opens file “foo”, calls f to read 
the file repeatedly, and closes the file upon an end-of-file exception. Suppose, as 
in the example of Sect. 1, we wish to verify that the file “foo” is accessed following 
the protocol in Fig. 1. 

First, we can remove exceptions by representing an exception handler as a 
special continuation [6]: 
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let readex x h k = read x; (if * then k() else h()) in 
let rec f x h k = readex x h (fun _ -> f x h k) in 
let d = open_in "foo" in f d (fun _ -> close d) (fun _ -> ()) 


Here, we have added to each function two parameters h and k, which represent 
an exception handler and a (normal) continuation respectively. 
Let P be (E, F true (Ar.(close)true) (Ar.true)) where £ is: 


Readex x h k =, (read)(k true ^ h true); 
F xh k =, Reader x h (Ar.F x h k). 


Here, we have just replaced read/close operations with the modal operators 
(read) and (close), non-deterministic choice with a logical conjunction, and the 
unit value () with true. Then, Lge = ® if and only if the program performs only 
valid accesses to the file (e.g., it does not access the file after a close operation), 
where Lie is the LTS shown in Fig. 1. The correctness of the reduction can be 
informally understood by observing that there is a close correspondence between 
reductions of the program and those of the HFL formula above, and when the 
program reaches a read command read x, the corresponding formula is of the 
form (read) ---, meaning that the read operation is valid in the current state; 
a similar condition holds also for close operations. We will present a general 
translation and prove its correctness in Sect. 6. 
Let us consider another example, which uses integers: 


let rec f y x k = if y=0 then (close x; k()) 
else (read x; f (y-1) x k) in 
let d = open_in "foo" in f nd (fun _ -> ()) 


Here, n is an integer constant. The function f reads x y times, and then calls the 
continuation k. Let Lite be the LTS obtained by adding to Lye a new state q2 


and the transition q1 1 q2 (which intuitively means that a program is allowed 
to terminate in the state q1), and let P’ be (E', F n true (Ar.(end)true)) where 
E' is: 


F y xz k =, (y=0= (close)(ktrue)) A (y #0 > (read)(F (y—1) z k)). 


Here, p(¥1,---, Pk) = Q is an abbreviation of =p(y1,..., Yk) V p. Then, Lite = 
©’ if and only if (i) the program performs only valid accesses to the file, (ii) it 
eventually terminates, and (iii) the file is closed when the program terminates. 
Notice the use of u instead of v above; by using u, we can express liveness 
properties. The property L'ale = ©’ indeed holds for n > 0, but not for n < 0. In 
fact, F n x k is equivalent to false for n < 0, and (read)"(close)(& true) for 
n>0. 
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4 Target Language 


This section sets up, as the target of program verification, a call-by-name? higher- 
order functional language extended with events. The language is essentially the 
same as the one used by Watanabe et al. [43] for discussing fair non-termination. 


4.1 Syntax and Typing 


We assume a finite set Ev of names called events, ranged over by a, and a 
denumerable set of variables, ranged over by x, y,.... Events are used to express 
temporal properties of programs. We write © (t, resp.) for a sequence of variables 
(terms, resp.), and write |x| for the length of the sequence. 

A program is a pair (D, t) consisting of a set D of function definitions { fı #1 = 
ti,.--,fn Zn = tn} and a term t. The set of terms, ranged over by t, is defined 
by: 


tz: =()|a|n|t; opts | event a;t| if p(t,,...,t,) then tı else t2 
| tıtə | tyOte. 


Here, n and p range over the sets of integers and integer predicates as in HFL 
formulas. The expression event a;t raises an event a, and then evaluates t. 
Events are used to encode program properties of interest. For example, an asser- 
tion assert(b) can be expressed as if b then () else (event fail; 2), where 
fail is an event that expresses an assertion failure and 9 is a non-terminating 
term. If program termination is of interest, one can insert “event end” to every 
termination point and check whether an end event occurs. The expression t,Litg 
evaluates tı or t2 in a non-deterministic manner; it can be used to model, e.g., 
unknown inputs from an environment. We use the meta-variable P for programs. 
When P = (D,t) with D = {fi T1 =t1,..., fn En = tn}, we write funs(P) for 
{fi,---,fn} (Le., the set of function names defined in P). Using A-abstractions, 
we sometimes write f = AT.t for the function definition f 7 = t. We also regard 
D as a map from function names to terms, and write dom(D) for {fi,..., fn} 
and D(fi) for ATi.ti. 

Any program (D,t) can be normalized to (D U {main = t}, main) where 
main is a name for the “main” function. We sometimes write just D for a 
program (D, main), with the understanding that D contains a definition of 
main. 

We restrict the syntax of expressions using a type system. The set of simple 
types, ranged over by «K, is defined by: 


ki =*|Nok n: = k | int. 


The types x, int, and 7 — « describe the unit value, integers, and functions 
from 7 to k respectively. Note that int is allowed to occur only in argument 


5 Call-by-value programs can be handled by applying the CPS transformation before 
applying the reductions to HFL model checking. 
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positions. We defer typing rules to [23], as they are standard, except that we 
require that the righthand side of each function definition must have type x; this 
restriction, as well as the restriction that int occurs only in argument positions, 
does not lose generality, as those conditions can be ensured by applying CPS 
transformation. We consider below only well-typed programs. 


4.2 Operational Semantics 


We define the labeled transition relation t —> p t, where £ is either € or an 
event name, as the least relation closed under the rules in Fig. 3. We implicitly 
assume that the program (D, t) is well-typed, and this assumption is maintained 
throughout reductions by the standard type preservation property. In the rules 
for if-expressions, [¢/] represents the integer value denoted by t/; note that the 
well-typedness of (D,t) guarantees that t; must be arithmetic expressions con- 
sisting of integers and integer operations; thus, [ti] is well defined. We often 


omit the subscript D when it is clear from the context. We write t 6 v if 
t mom De Fa, p t. Here, € is treated as an empty sequence; thus, for example, 
c aby ays a G b € j 
we write t >, t if t D D D pt. 
fé-ueD = (|z\=(|E (lil ---- lD € [el 
event a;t p t ftp [t/Zu if p(t,,...,t,) then ti else tg +p ti 
i€ {1,2} (ll ---- lD g [el 
Hiblis, "3 5: i if p(ti,..-, tp) then tı else t2 Sp te 


Fig. 3. Labeled transition semantics 


For a program P = (D,to), we define the set Traces(P)(C Ev“ U Ev”) of 
traces by: 


Traces(D, to) = {lo ln—1 € ({e} UEv)* | Vi € {0,...,n — 1}ti >p tigi} 
Uflols + € ({e} U Ev)” | Vi € wti ep tiga}. 
Note that since the label e is regarded as an empty sequence, folıf2 = aa if 
lo = 2 = a and ¢; = €e, and an element of ({e} U Ev)” is regarded as that of 
Ev* U Ev”. We write FinTraces(P) and InfTraces(P) for Traces(P) N Ev* 


and Traces(P) N Ev“ respectively. The set of full traces FullTraces(D, to)(C 
Ev* U Ev”) is defined as: 


{lq ln € ({e} UEv)* | th =() AVE € {0,...,2 —1}-ty Æp tigi} 
Ul lols «++ € ({e} UEv)” | Vi € wti Sp tigi}. 
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Example 4. The last example in Sect.1 is modeled as Pre = (D, f ()), where 
D = {f x = (event close; ())O(event read; event read; f x)}. We have: 


Traces(P) = {read” | n > 0} U {read?”"close | n > 0} U {read} 
FinTraces(P) = {read” | n > 0} U {read?”"close | n > 0} 
InfTraces(P) = {read’} FullTraces(P) = {read?"close | n > 0} U {read’}. 


5 May/Must-Reachability Verification 


Here we consider the following problems: 


— May-reachability: “Given a program P and an event a, may P raise a?” 
— Must-reachability: “Given a program P and an event a, must P raise a?” 


Since we are interested in a particular event a, we restrict here the event set 
Ev to a singleton set of the form {a}. Then, the may-reachability is formalized 


as a é Traces(P), whereas the must-reachability is formalized as “does every 
trace in FullTraces(P) contain a?” We encode both problems into the validity 
of HFLz formulas (without any modal operators (a) or [a]), or the HFLz model 
checking of those formulas against a trivial model (which consists of a single state 
without any transitions). Since our reductions are sound and complete, the char- 
acterizations of their negations —non-reachability and may-non-reachability— can 
also be obtained immediately. Although these are the simplest classes of prop- 
erties among those discussed in Sects. 5,6 and 7, they are already large enough 
to accommodate many program properties discussed in the literature, including 
lack of assertion failures/uncaught exceptions [22] (which can be characterized as 
non-reachability; recall the encoding of assertions in Sect. 4), termination [27,29] 
(characterized as must-reachability), and non-termination [26] (characterized as 
may-non-reachability). 


5.1 May-Reachability 


As in the examples in Sect.3, we translate a program to a formula that says 
“the program may raise an event a” in a compositional manner. For example, 
event a;t can be translated to true (since the event will surely be raised imme- 
diately), and t;Dit2 can be translated to t! v tb where t! is the result of the 
translation of t; (since only one of tı and tz needs to raise an event). 


Definition 3. Let P = (D,t) be a program. Bp may is the HES (D're, ttma), 
where Dim and tim are defined by: 


{fi Tı = th, sey Jr Lin = tn} e” = (fi Tı =i ty trav; meer fr Th =i tnim) 
()'m” = false rimu = g nim =n (tı op ta)" = tt" op tatma 
(if p(t,,...,t),) then tı else t2)'"” = 
(pe tm, 2. GT) A ty tm) V (p(t to, .. t1) A tate) 
(event a;t)'™ = true (tte) = tytmvtytma (ty tg) Fev = tithe V tot may, 
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Note that, in the definition of Dim”, the order of function definitions in D does 
not matter (i.e., the resulting HES is unique up to the semantic equality), since 
all the fixpoint variables are bound by p. 


Example 5. Consider the program: 
Proop = ({loop x = loop x}, loop(event a; ())). 


It is translated to the HES Sloop = (loop x =, loop x, loop(true)). Since loop = 
ploop.Ax.loop x is equivalent to Ax.false, Djoop is equivalent to false. In fact, 
Pioop never raises an event a (recall that our language is call-by-name). 


Example 6. Consider the program Psum = (Dsum, main) where Dsum is: 


main = sum n (Ar.assert(r > n)) 
sum x k = if x = 0 then k0 else sum (x — 1) (Ar.k(a + r)) 


Here, n is some integer constant, and assert (b) is the macro introduced in Sect. 4. 
We have used A-abstractions for the sake of readability. The function sum is a 
CPS version of a function that computes the summation of integers from 1 to 
x. The main function computes the sum r = 1 +---+n, and asserts r > n. It is 
translated to the HES ®p, may = (Esum,main) where Esum is: 


main =, sum n (Ar.(r > n A false) V (r < n A^ true)); 
sum x k =, (x = 0A k0) V (x #0 ^ sum (x — 1) (Ar.k(x +r))). 


Here, n is treated as a constant. Since the shape of the formula does not depend 
on the value of n, the property “an assertion failure may occur for some n” can 
be expressed by 4n.®p, may- 


The following theorem states that PpP may is a complete characterization of 
the may-reachability of P. 


Theorem 1. Let P be a program. Then, a € Traces(P) if and only if Lo = 
PP may for Lo = ({s,},0,0, S,). 


A proof of the theorem above is found in [23]. We only provide an outline. We 
first show the theorem for recursion-free programs and then lift it to arbitrary 
programs by using the continuity of functions represented in the fixpoint-free 
fragment of HFLz formulas. To show the theorem for recursion-free programs, 
we define the reduction relation t — p t’ by: 


fe=ueD |¥|= |t (il, -ll € iel 
Elf ý —p Elft/Zlu] Elif p(t), ..., ti) then tı else t2] —p Eftı] 
(l ---- D) g [ell 
Elif p(t),...,t,,) then tı else t2} —p Eftə] 
Here, E ranges over the set of evaluation contexts given by E: = [|] | ELt 
| (OE | event a; E. The reduction relation differs from the labeled transition 
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relation given in Sect. 4, in that O and event a;--- are not eliminated. By the def- 
inition of the translation, the theorem holds for programs in normal form (with 
respect to the reduction relation), and the semantics of translated HFL formulas 
is preserved by the reduction relation; thus the theorem holds for recursion-free 
programs, as they are strongly normalizing. 


5.2 Must-Reachability 


The characterization of must-reachability can be obtained by an easy modifica- 
tion of the characterization of may-reachability: we just need to replace branches 
with logical conjunction. 


Definition 4. Let P = (D,t) be a program. ®p must is the HES (Dimes, timus), 
where Dims and tim are defined by: 


{fry = tay. fn En = te = (fa a Sp tals 5 fn En =p tnt’) 
(ie 7 fale gte =m ml must =n (ti op tajiet = ty Tense op toimest 
(if p( 45 see ith) then ty else pT = 

(pP, i ek toe) => ty trest) A (nople tee, _ ee) = to tmust) 
(event a; gires = true Gaur = ty Trust fi must (ty tajine = Ly Tnust x tot must 


Here, p(vi,---, Pk) => y is a shorthand for >p(p1,---; Pk) V Y. 


Example 7. Consider Poop = (D,loopmn) where D is: 


loop x y = if z < 0 V y < 0 then (event end; ()) 
else (Loop (x — 1) (y * y))O(lo0op z (y — 1)) 


Here, the event end is used to signal the termination of the program. The function 
loop non-deterministically updates the values of x and y until either x or y 
becomes non-positive. The must-termination of the program is characterized by 
P Poop mat = (E, Lo0pm n) where £ is: 


loop x y =, (x < 0V y <0 > true) 
A((x <0 Vy <0) = (loop (z — 1) (y * y)) A (Loop < (y — 1))). 


We write Musta (P) if every 7 € FullTraces(P) contains a. The following 
theorem, which can be proved in a manner similar to Theorem 1, guarantees that 
PP must is indeed a sound and complete characterization of the must-reachability. 


Theorem 2. Let P be a program. Then, Must,(P) if and only if Lo EF DP, must 
for Lo = ({sx}, 0, 0, Sx). 
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6 Trace Properties 


Here we consider the verification problem: “Given a (non-w) regular language 
L and a program P, does every finite event sequence of P belong to L? (i.e. 


? 
FinTraces(P) C L)” and reduce it to an HFLz model checking problem. The 
verification of file-accessing programs considered in Sect. 3 may be considered an 
instance of the problem. 

Here we assume that the language L is closed under the prefix operation; 
this does not lose generality because FinTraces(P) is also closed under the 
prefix operation. We write Az = (Q, X, ô, qo, F) for the minimal, deterministic 
automaton with no dead states (hence the transition function 6 may be partial). 
Since L is prefix-closed and the automaton is minimal, w € L if and only if 
ô(qo, w) is defined (where ô is defined by: ô(q, €) = q and (q, aw) = ô(ô(q, a), w)). 
We use the corresponding LTS Lz = (Q,2,{(q¢,a,q’) | 6(¢,a) = q'}, qo) as the 
model of the reduced HFLz model checking problem. 

Given the LTS Lz above, whether an event sequence a,--- a, belongs to L 

Ei 


can be expressed as Lz |} (a1)--- (ap)true. Whether all the event sequences 
? 


in {aj1---ajn, | j € {1,...,n}} belong to L can be expressed as Ly - 
Ajeqa,...n} (83,1) +++ (j,k; true. We can lift these translations for event sequences 
to the translation from a program (which can be considered a description of a 
set of event sequences) to an HFLz formula, as follows. 

Definition 5. Let P = (D,t) be a program. ÐP pat, is the HES (Diath pipat), 
where D'r and tie are defined by: 


{fi Tı = tı, DRR Tn In = tnpa = (ft Tı =v ty Tram, LS, Ta Tn =p tn roth) 
(Jie = true girah = y nirah = n (ty op i)" 2 tq Trath op tot path 
(if p(t,,...,t,,) then tı else t3) = 
(peie, oi KAL) Z titrer) N (p(t t, D t teh) = ta tratn ) 
(event a; pirn nE (a) tt pain (fit) — ty Trath go t path (ty fn) te = tq Troth A te troth, 


Example 8. The last program discussed in Sect.3 is modeled as Py = 
(D2, f mg), where m is an integer constant and Də consists of: 


f y k= if y =0 then (event close;k()) else (event read; f (y — 1) k) 
g r = event end; () 


Here, we have modeled accesses to the file, and termination as events. Then, 
= + ..6 
PP, path = (EP, path; f m g) where EP, path 15: 


fn k=, (n = 0 => (close) (k true)) A (n 4 0 => (read)(f (n — 1) k)) 
g r =, (end)true. 


Let L be the prefix-closure of read* -close end. Then Lz is L'ile in Sect. 3, and 
FinTraces(P2)CL can be verified by checking Lr F®p, path- 


6 Unlike in Sect. 3, the variables are bound by v since we are not concerned with the 
termination property here. 
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Theorem 3. Let P be a program and L be a regular, prefix-closed language. 
Then, FinTraces(P) C L if and only if Lr = PP path. 


As in Sect.5, we first prove the theorem for programs in normal form, and 
then lift it to recursion-free programs by using the preservation of the semantics 
of HFLz formulas by reductions, and further to arbitrary programs by using the 
(co-)continuity of the functions represented by fixpoint-free HF Lz, formulas. See 
[23] for a concrete proof. 


7 Linear-Time Temporal Properties 


This section considers the following problem: “Given a program P and an w- 
regular word language L, does InfTraces(P)NL = 9 hold?” . From the viewpoint 
of program verification, L represents the set of “bad” behaviors. This can be 
considered an extension of the problems considered in the previous sections. 

The reduction to HFL model checking is more involved than those in the 
previous sections. To see the difficulty, consider the program Po: 


({f = if c then (event a; f) else (event b; f)}, f), 


where c is some boolean expression. Let L be the complement of (a*b)”, i.e., 
the set of infinite sequences that contain only finitely many b’s. Following Sect. 6 
(and noting that InfTraces(P)NL = @ is equivalent to InfTraces(P) C (a*b)” 
in this case), one may be tempted to prepare an LTS like the one in Fig. 4 (which 
corresponds to the transition function of a (parity) word automaton accepting 
(a*b)”), and translate the program to an HES ®p, of the form: 


(F =a (c= (a)f) (re = (o)f), P), 


where a is u or v. However, such a translation would not work. If c = true, 
then InfTraces(Po) = a”, hence InfTraces(Pp)N L 4 0; thus, a should be p 
for Pp, to be unsatisfied. If c = false, however, InfTraces(Pj) = b“, hence 
InfTraces(Po) N L = 0; thus, a must be v for Pp, to be satisfied. 


Fig. 4. LTS for (a*b)” 


The example above suggests that we actually need to distinguish between the 
two occurrences of f in the body of f’s definition. Note that in the then- and 
else-clauses respectively, f is called after different events a and b. This difference 
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is important, since we are interested in whether b occurs infinitely often. We 
thus duplicate f, and replace the program with the following program Pgup: 


({fo = if c then (event a; fa) else (event b; fẹ), 
fa = if c then (event a; fa) else (event b; f,)}, fo). 


For checking InfTraces(P)) 1 L = Q, it is now sufficient to check that fẹ is 
recursively called infinitely often. We can thus obtain the following HES: 


((fo =v (c > (a) fa) A (“e => (b) fo); fa =u (e= (a) fa) A (=e => (b) fo)), fo). 


Note that fẹ and fa are bound by v and pu respectively, reflecting the fact that 
b should occur infinitely often, but a need not. If c = true, the formula is 
equivalent to v fp.(a)ufa.(a) fa, which is false. If c = false, then the formula is 
equivalent to vf,.(b) fy, which is satisfied by by the LTS in Fig. 4. 

The general translation is more involved due to the presence of higher-order 
functions, but, as in the example above, the overall translation consists of two 
steps. We first replicate functions according to what events may occur between 


two recursive calls, and reduce the problem InfTraces(P) N L toa problem 
of analyzing which functions are recursively called infinitely often, which we call 
a call-sequence analysis. We can then reduce the call-sequence analysis to HFL 
model checking in a rather straightforward manner (though the proof of the 
correctness is non-trivial). The resulting HFL formula actually does not contain 
modal operators.’ So, as in Sect. 5, the resulting problem is the validity checking 
of HFL formulas without modal operators. 

In the rest of this section, we first introduce the call-sequence analysis prob- 
lem and its reduction to HFL model checking in Sect. 7.1. We then show how to 
reduce the temporal verification problem InfTraces(P)N L = Ø to an instance 
of the call-sequence analysis problem in Sect. 7.2. 


7.1 Call-Sequence Analysis 


We define the call-sequence analysis and reduce it to an HFL model-checking 
problem. As mentioned above, in the call-sequence analysis, we are interested in 
analyzing which functions are recursively called infinitely often. Here, we say that 


g is recursively called from f, if f § +p [s/z]ts is gt, where f&=ty € D 
and g “originates from” ty (a more formal definition will be given in Definition 6 
below). For example, consider the following program Papp, which is a twisted 
version of Pap, above. 


({appha = hz, 
fex = if x > 0 then (event a; app fa (x — 1)) else (event b; app fè 5), 
fax = if x > 0 then (event a; app fa (x — 1)) else (event b; app fe 5)}, fo 5). 


7 In the example above, we can actually remove (a) and (b), as information about 
events has been taken into account when f was duplicated. 


728 N. Kobayashi et al. 


Then fa is “recursively called” from fp in fp 5 =", app fa4 55 fa4 (and so is 
app). We are interested in infinite chains of recursive calls fo fi fo---, and which 
functions may occur infinitely often in each chain. For instance, the program 
above has the unique infinite chain (f,f°)”, in which both fa and fe occur 
infinitely often. (Besides the infinite chain, the program has finite chains like 
fo app; note that the chain cannot be extended further, as the body of app does 
not have any occurrence of recursive functions: app, fa and fy.) 

We define the notion of “recursive calls” and call-sequences formally below. 


Definition 6 (Recursive call relation, call sequences). Let P = (D, fı 5) 
be a program, with D = { fi i = uisi<i<n. We define Dt := DU{ f? £ =U h<ien 
where | ane | are fresh symbols. (Thus, D* has two copies of each function 
symbol, one of which is marked by t.) For the terms t; and t; that do not contain 


marked symbols, we write fi tip f; t; if (i) ESEMS fry. Ll Č, 
and (ii) t; is obtained by erasing all the marks in Hi We write Callseq(P) for 
the set of (possibly infinite) sequences of function symbols: 


{fgg | fpg tim gz top ++}. 


We write InfCallseq(P) for the subset of Callseq(P) consisting of infinite 
sequences, i.e., Callseq(P)N {fi,.-., fn}. 


For example, for Papp above, Callseq(P) is the prefix closure of {( fe f$)” } U 
{s - app | s is a non-empty finite prefix of (fẹ f5)” }, and InfCallseq(P) is the 
singleton set {(f,f2)°}. 


Definition 7 (Call-sequence analysis). A priority assignment for a pro- 
gram P is a function 2: funs(P) — N from the set of function symbols of P 
to the set N of natural numbers. We write Fesa (P, Q) if every infinite call- 
sequence gogigz2::: E€ InfCallseq(P) satisfies the parity condition w.r.t. Q, i.e., 
the largest number occurring infinitely often in 2(go)2(g1)2(g2)... is even. 
Call-sequence analysis is the problem of, given a program P with a priority 
assignment N, deciding whether Esa (P, Q) holds. 


For example, for Papp and the priority assignment Rapp = {app > 3, fa > 
1, fo 2}, esa (Papp, Rapp) holds. 

The call-sequence analysis can naturally be reduced to HFL model checking 
against the trivial LTS Lo = ({s,},9,0,s,) (or validity checking). 


Definition 8. Let P = (D,t) be a program and Q be a priority assignment for 
P. The HES ©(p.9),csa 18 (Disa thea) where Dese and t'ese are defined by: 


{hi Tı = ti, Mn Tn = tn) = = (fi Tı =a tiles; Es, itn En an taie) 
(ise = true gptesa =f nicsa =n (ti op iy) = ty tesa op to tesa 
(if p(t),...,t/,) then tı else t,)'* = 
(p(t, Fe, me tee) > ty ten) A (pei, _ ETa) 4 tatea) 
(event a; tiee = fiesa (ty le a ty tesa to tesa (ty ty) ter = ty tesa A tolesa, 
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Here, we assume that Q(fi) > Q(fi+ı) for each i € {1,...,n— 1}, and a; =v 
if Q(fi) is even and u otherwise. 


The following theorem states the soundness and completeness of the reduc- 
tion. See [23] for a proof. 


Theorem 4. Let P be a program and N2 be a priority assignment for P. Then 
Fesa (P, 2) if and only if Lo = ®:p.a),csa- 


Example 9. For Papp and Rapp above, (Papp, Pamp) = (E, fa 5), where: € is: 


apphz =, hz; fyx=, (x > 0 => app fa (x —1)) A (x < 0 = app fi 5); 
faz =, (£ > 0 => app fa (x — 1)) A (x < 0 = app fo 5). 


Note that Lo — (Papp, Napp) holds. 


7.2 From Temporal Verification to Call-Sequence Analysis 
This subsection shows a reduction from the temporal verification problem 
? 


InfTraces(P) N L =) to a call-sequence analysis problem = sa (Ph 0): 

For the sake of simplicity, we assume without loss of generality that every 
program P = (D, t) in this section is non-terminating and every infinite reduc- 
tion sequence produces infinite events, so that FullTraces(P) = InfTraces(P) 
holds. We also assume that the w-regular language L for the temporal verification 
problem is specified by using a non-deterministic, parity word automaton [10]. 
We recall the definition of non-deterministic, parity word automata below. 


Definition 9 (Parity automaton). A non-deterministic parity word automa- 
ton is a quintuple A = (Q, X, 8,qr, 2) where (i) Q is a finite set of states; (ii) 
X is a finite alphabet; (iii) ô, called a transition function, is a total map from 
Qx X to 28; (iv) qr € Q is the initial state; and (v) Q € Q —> N is the priority 
function. Arun of A on an w-word agai: € X” is an infinite sequence of states 
p = p(0)p(1)-++ € Q” such that (i) p(0) = qr, and (ii) pli +1) € Slp), ai) for 
each i € w. An w-word w E€ X® is accepted by A if, there exists a run p of A on 
w such that max{(q) | q E€ Inf(p)} is even, where Inf(p) is the set of states 
that occur infinitely often in p. We write L(A) for the set of w-words accepted 
by A. 


For technical convenience, we assume below that 6(q,a) 4 @ for every q € Q and 
a € X; this does not lose generality since if 6(q,a) = Ø, we can introduce a new 
“dead” state qadeaa (with priority 1) and change 6(q, a) to {qaeaa}. Given a parity 
automaton A, we refer to each component of A by Qu, XA, ôA, qr,a and 24. 


Example 10. Consider the automaton Aab = ({da; qv}, {a, b}, ô, da, 2), where ô is 
as given in Fig. 4, 2(qa) = 0, and 2(q,) = 1. Then, L(Aab) = (a*b)” = (a*b)*a’. 


730 N. Kobayashi et al. 


The goal of this subsection is, given a program P and a parity word automaton 
A, to construct another program P’ and a priority assignment 2 for P’, such 
that InfTraces(P) N L(A) = 0 if and only if Hesa (P’, 2). 

Note that a necessary and sufficient condition for InfTraces(P) N L(A) = 0 
is that no trace in InfTraces(P) has a run whose priority sequence satisfies 
the parity condition; in other words, for every sequence in InfTraces(P), and 
for every run for the sequence, the largest priority that occurs in the associated 
priority sequence is odd. As explained at the beginning of this section, we reduce 
this condition to a call sequence analysis problem by appropriately duplicating 
functions in a given program. For example, recall the program Po: 


({f =if c then (event a; f) else (event b; f)}, f). 
It is translated to Pĝ: 


({fo = if c then (event a; fa) else (event b; fẹ), 
fa = if c then (event a; fa) else (event b; f,)}, fo), 


where c is some (closed) boolean expression. Since the largest priorities encoun- 
tered before calling fa and fe (since the last recursive call) respectively are 0 
and 1, we assign those priorities plus 1 (to flip odd/even-ness) to fa and fe 
respectively. Then, the problem of InfTraces(Py) N L(A) = Ø is reduced to 
Hesa (Pi, {fa | 1, fo œ 2}). Note here that the priorities of fa and fe represent 
summaries of the priorities (plus one) that occur in the run of the automa- 
ton until fa and fp are respectively called since the last recursive call; thus, the 
largest priority of states that occur infinitely often in the run for an infinite trace 
is equivalent to the largest priority that occurs infinitely often in the sequence of 
summaries (2(f1) — 1)(2(f2) —1)(2(f3) — 1) --- computed from a corresponding 
call sequence fi faf 

Due to the presence of higher-order functions, the general reduction is more 
complicated than the example above. First, we need to replicate not only function 
symbols, but also arguments. For example, consider the following variation Pı 
of Po above: 


({gk = if c then (event a; k) else (event b;k), f=gf}, P. 


Here, we have just made the calls to f indirect, by preparing the function g. 
Obviously, the two calls to k in the body of g must be distinguished from each 
other, since different priorities are encountered before the calls. Thus, we dupli- 
cate the argument k, and obtain the following program Pj: 


({g ka kp = if c then (event a; ka) else (event b; ky), fa = g fa fo, fo = 9 fa fo}, fa)- 


Then, for the priority assignment 2 = {fa > 1, fp > 2, g > 1}, InfTraces(P,)N 
L(Aab) = Ô if and only if Hesa (Pi, 2). Secondly, we need to take into account 
not only the priorities of states visited by A, but also the states themselves. 
For example, if we have a function definition f h = h(event a; f h), the largest 
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priority encountered before f is recursively called in the body of f depends on 
the priorities encountered inside h, and also the state of A when h uses the argu- 
ment event a; f (because the state after the a event depends on the previous 
state in general). We, therefore, use intersection types (a la Kobayashi and Ong’s 
intersection types for HORS model checking [21]) to represent summary infor- 
mation on how each function traverses states of the automaton, and replicate 
each function and its arguments for each type. We thus formalize the translation 
as an intersection-type-based program transformation; related transformation 
techniques are found in [8, 11,12, 20,38]. 


Definition 10. Let A = (Q, X,ô,qr, 2) be a non-deterministic parity word 
automaton. Let q and m range over Q and the set codom(Q2) of priorities respec- 
tively. The set Types 4 of intersection types, ranged over by 0, is defined by: 


6::=q| p80 p: = int | Nici<p (Fi, mi) 


We assume a certain total order < on Types, x N, and require that in 
Ayerex(Oi,™4), (0i, Mmi) < (0j, mj) holds for each i < j. 


We often write (01, M1) A+- A (Ok, Mg) for Ayej;<,(8i,mi), and T when k = 0. 
Intuitively, the type q describes expressions of simple type x, which may be evalu- 
ated when the automaton A is in the state q (here, we have in mind an execution 
of the product of a program and the automaton, where the latter takes events 
produced by the program and changes its states). The type (A, <;<;(i,mi)) > 0 
describes functions that take an argument, use it according to types 01,..., 9x, 
and return a value of type 0. Furthermore, the part m; describes that the argu- 
ment may be used as a value of type 0; only when the largest priority visited since 
the function is called is m;. For example, given the automaton in Example 10, the 
function Ax.(event a; x) may have types (qa,0) — qa and (qa,0) > qb, because 
the body may be executed from state qa or qe (thus, the return type may be any 
of them), but 2 is used only when the automaton is in state qa and the largest 
priority visited is 1. In contrast, Aw.(event b;2) have types (q,1) > qa and 
(d, 1) > Q. 

Using the intersection types above, we shall define a type-based transforma- 
tion relation of the form + 4 t: 0 = t’, where t and t’ are the source and target 
terms of the transformation, and I’, called an intersection type environment, is a 
finite set of type bindings of the form z : int or z: (0, m, m’). We allow multiple 
type bindings for a variable x except for x: int (i.e. if x:int € I, then this must 
be the unique type binding for x in I’). The binding z : (0, m, m’) means that x 
should be used as a value of type 0 when the largest priority visited is m; m’ is 
auxiliary information used to record the largest priority encountered so far. 

The transformation relation [ +, t : 6 = t is inductively defined by 
the rules in Fig. 5. (For technical convenience, we have extended terms with 
A-abstractions; they may occur only at top-level function definitions.) In the 
figure, [k] denotes the set {i ¢ N | 1 <i < k}. The operation I Î m used in the 
figure is defined by: 


Ct m= {x:int | v:int € P}U {x: (0, mı, max(m2,m)) | ©: (0,m1,m2) E T} 
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The operation is applied when the priority m is encountered, in which case the 
largest priority encountered is updated accordingly. The key rules are I'T-VAR, 
IT-EventT, IT-App, and IT-Apss. In IT-VAR, the variable x is replicated for 
each type; in the target of the translation, 79, and £g’ m are treated as different 
variables if (0, m) 4 (0', m’). The rule IT-EVENT reflects the state change caused 
by the event a to the type and the type environment. Since the state change may 
be non-deterministic, we transform t for each of the next states qi,...,@n, and 
combine the resulting terms with non-deterministic choice. The rule IT-APP 
and IT-ABs replicates function arguments for each type. In addition, in IT- 
APP, the operation I’ | m; reflects the fact that tz is used as a value of type 0; 
after the priority m; is encountered. The other rules just transform terms in a 
compositional manner. If target terms are ignored, the entire rules are close to 
those of Kobayashi and Ong’s type system for HORS model checking [21]. 


Fea (iT-Unir) Pratiiqeti Dhatiqet 
PraQ:a>() a 

I Fa ti Ute: ¢ > Ute 
(IT-NoNnDET) 


QIy,ex:intky, «v:int > Lint , ; 
(IT-VaRINT) Prati :int >00 >ti 
(IT-Var) Ca tg: int > th 
T,x:(0,m,m) FA 2:0 > Tom TEA t tz:0 >t t, 
(IT-INT) T Fati: Nicicn (Gi, mi) a ti 
I T mi Ha to: 0; = ty, (for each i € [k]) 
I FHA t t2: 0 = ti toi... toy 


(IT-APPINT) 


Fran:int>n 
Tati :int > ti Pea tg:int > th 


T Fa ti op tg: int > ti opt) 


(IT-OP) (IT-APP) 
Chat :int >t; (for each i € [k]) T,x:intFat:0>t x ¢ dom(T) 
Tha tepii:¢ => th TD kag Az.t: int > 0 => Avint.t! 
I Fa te42:¢ > the (IT-AssINT) 
t=t,...,t, t=t),...,t TD U{a:(6:,mi,0) |ie [k] rat: a st’ 
T Fa if p(t) then ty41 else trio: z ¢ dom(I’) - 
=> if p(t’) then tp} else t42 Dra Azt: Nycicn(%: m) > 0 
(IT-IF) => ALO,,m, es tgm 
ôa (q,a) = {q,---, dk} (IT-ABs) 


Ut Qa(qi) Fa t:qi >t; (for each i € [k]) 
T Ha (event a;t):q > (event a; t10- -- Otk) 
(IT-EVENT) 


Fig. 5. Type-based transformation rules for terms 


We now define the transformation for programs. A top-level type environment 
= is a finite set of type bindings of the form x : (8, m). Like intersection type 
environments, = may have more than one binding for each variable. We write 
Z Fa t:0tomean {x : (0,m,0)|a:(6,m) € E} Fa t: 0. For aset D of function 
definitions, we write £ F4 D => D' if dom(D') = { fom | f : (0,m) € £ } and 
EE, D(f):0 = D'( fom) for every f:(0,m) € £. For a program P = (D, t), we 


Higher-Order Program Verification via HFL Model Checking 733 


write £ Fa P s (P',0')if P =(D',v), E Fa D> D and Ftyt:q >f, 
with 2'( fom) = m-+1 for each fom € dom(D’). We just write Fa P > (P’, 2’) 
if £ F4 P => (P’, Q’) holds for some £. 


Example 11. Consider the automaton Aap in Example 10, and the program P> = 
(D2, f 5) where D2 consists of the following function definitions: 


g k = (event a; k)O(event b; k), 
f«=if x > o0 then g(f(x-— 1)) else (event b; f 5). 


Let = be: {g : ((qa,0) A (qo, 1) > qa, 0), g : ((qa,0) A (qo, 1) > q, 0), f : (int —> 
da, 0), f : (int — q,1)}. Then, £ Fa Pi > ((D3, fint=qa,0 5), 2) where: 


D3 = {9lqa,0)Ala:1)—>0a:0 Faa,0 Kaps = tgs 9(aa.0)A(av.1) 45.0 Kaa:0 Kapi = tg, 
fint—qa,0 Vint = tfiaas Jinta, Vint = tfa 
tg = (event a; kg, ,0)O(event b; kq,,1), 
tfa = if £in > 0 then 
Ilaa,0)Alap:1)>9,0 (Jint—qa,0 (Lint — 1)) (fint—>as,1 (Zint — 1)) 
else (event b; fint—q,,1 5), (for each q € {qa,q}) 
2 = {9(aa.0)A(ap.1) 40.0 | 1, 9(qa,0) Aap.) ap,0 | 1, fintqa,0 | 1, fintq,,1 > 2}. 


Notice that f, g, and the arguments of g have been duplicated. Further- 
more, whenever fom is called, the largest priority that has been encountered 
since the last recursive call is m. For example, in the then-clause of fint—qa,0, 
fint—q,,1(@ — 1) may be called through 9(q, ,0)A(qy,1)—-qa,0- SINCE 9(qa,0)A(qv51) 40,0 
uses the second argument only after an event b, the largest priority encountered 
is 1. This property is important for the correctness of our reduction. 


The following theorems below claim that our reduction is sound and com- 
plete, and that there is an effective algorithm for the reduction: see [23] for 
proofs. 


Theorem 5. Let P be a program and A be a parity automaton. Suppose that 
EL, P= (P', Q). Then InfTraces(P)N L(A) = 0 if and only if Hesa (P’, Q). 


Theorem 6. For every P and A, one can effectively construct =, P’ and Q 
such that Z F4 P > (P’,2). 


The proof of Theorem 6 above also implies that the reduction from temporal 

property verification to call-sequence analysis can be performed in polynomial 

time. Combined with the reduction from call-sequence analysis to HFL model 

checking, we have thus obtained a polynomial-time reduction from the temporal 
? 


verification problem InfTraces(P) C L(A) to HFL model checking. 


8 Related Work 


As mentioned in Sect. 1, our reduction from program verification problems to 
HFL model checking problems has been partially inspired by the translation of 
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Kobayashi et al. [19] from HORS model checking to HFL model checking. As in 
their translation (and unlike in previous applications of HFL model checking [28, 
42]), our translation switches the roles of properties and models (or programs) 
to be verified. Although a combination of their translation with Kobayashi’s 
reduction from program verification to HORS model checking [17,18] yields an 
(indirect) translation from finite-data programs to pure HFL model checking 
problems, the combination does not work for infinite-data programs. In contrast, 
our translation is sound and complete even for infinite-data programs. Among the 
translations in Sects. 5,6 and 7, the translation in Sect. 7.2 shares some similarity 
to their translation, in that functions and their arguments are replicated for 
each priority. The actual translations are however quite different; ours is type- 
directed and optimized for a given automaton, whereas their translation is not. 
This difference comes from the difference of the goals: the goal of [19] was to 
clarify the relationship between HORS and HFL, hence their translation was 
designed to be independent of an automaton. The proof of the correctness of 
our translation in Sect. 7 is much more involved due to the need for dealing with 
integers. Whilst the proof of [19] could reuse the type-based characterization of 
HORS model checking [21], we had to generalize arguments in both [19,21] to 
work on infinite-data programs. 

Lange et al. [28] have shown that various process equivalence checking prob- 
lems (such as bisimulation and trace equivalence) can be reduced to (pure) HFL 
model checking problems. The idea of their reduction is quite different from ours. 
They reduce processes to LTSs, whereas we reduce programs to HFL formulas. 

Major approaches to automated or semi-automated higher-order program 
verification have been HORS model checking [17,18,22,27,31,33,43], (refine- 
ment) type systems [14,24,34-36,39,41,44], Horn clause solving [2,7], and their 
combinations. As already discussed in Sect. 1, compared with the HORS model 
checking approach, our new approach provides more uniform, streamlined meth- 
ods. Whilst the HORS model checking approach is for fully automated verifi- 
cation, our approach enables various degrees of automation: after verification 
problems are automatically translated to HFLz formulas, one can prove them 
(i) interactively using a proof assistant like Coq (see [23]), (ii) semi-automatically, 
by letting users provide hints for induction/co-induction and discharging the rest 
of proof obligations by (some extension of) an SMT solver, or (iii) fully auto- 
matically by recasting the techniques used in the HORS-based approach; for 
example, to deal with the v-only fragment of HFLz, we can reuse the tech- 
nique of predicate abstraction [22]. For a more technical comparison between 
the HORS-based approach and our HFL-based approach, see [23]. 

As for type-based approaches [14, 24, 34-36, 39, 41,44], most of the refinement 
type systems are (i) restricted to safety properties, and/or (ii) incomplete. A 
notable exception is the recent work of Unno et al. [40], which provides a rela- 
tively complete type system for the classes of properties discussed in Sect. 5. Our 
approach deals with a wider class of properties (cf. Sects.6 and 7). Their “rel- 
ative completeness” property relies on Godel coding of functions, which cannot 
be exploited in practice. 
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The reductions from program verification to Horn clause solving have recently 
been advocated [2—4] or used [34,39] (via refinement type inference problems) 
by a number of researchers. Since Horn clauses can be expressed in a fragment 
of HFL without modal operators, fixpoint alternations (between v and u), and 
higher-order predicates, our reductions to HFL model checking may be viewed 
as extensions of those approaches. Higher-order predicates and fixpoints over 
them allowed us to provide sound and complete characterizations of properties 
of higher-order programs for a wider class of properties. Bjgrner et al. [4] pro- 
posed an alternative approach to obtaining a complete characterization of safety 
properties, which defunctionalizes higher-order programs by using algebraic data 
types and then reduces the problems to (first-order) Horn clauses. A disadvan- 
tage of that approach is that control flow information of higher-order programs 
is also encoded into algebraic data types; hence even for finite-data higher-order 
programs, the Horn clauses obtained by the reduction belong to an undecidable 
fragment. In contrast, our reductions yield pure HFL model checking problems 
for finite-data programs. Burn et al. [7] have recently advocated the use of higher- 
order (constrained) Horn clauses for verification of safety properties (i.e., which 
correspond to the negation of may-reachability properties discussed in Sect. 5.1 of 
the present paper) of higher-order programs. They interpret recursion using the 
least fixpoint semantics, so their higher-order Horn clauses roughly corresponds 
to a fragment of the HFLz without modal operators and fixpoint alternations. 
They have not shown a general, concrete reduction from safety property verifi- 
cation to higher-order Horn clause solving. 

The characterization of the reachability problems in Sect. 5 in terms of formu- 
las without modal operators is a reminiscent of predicate transformers [9, 13] used 
for computing the weakest preconditions of imperative programs. In particular, 
[5] and [13] respectively used least fixpoints to express weakest preconditions for 
while-loops and recursions. 


9 Conclusion 


We have shown that various verification problems for higher-order functional 
programs can be naturally reduced to (extended) HFL model checking prob- 
lems. In all the reductions, a program is mapped to an HFL formula expressing 
the property that the behavior of the program is correct. For developing verifica- 
tion tools for higher-order functional programs, our reductions allow us to focus 
on the development of (automated or semi-automated) HFLz model checking 
tools (or, even more simply, theorem provers for HFLz without modal operators, 
as the reductions of Sects.5 and 7 yield HFL formulas without modal opera- 
tors). To this end, we have developed a prototype model checker for pure HFL 
(without integers), which will be reported in a separate paper. Work is under 
way to develop HFLz model checkers by recasting the techniques [22, 26,27, 43] 
developed for the HORS-based approach, which, together with the reductions 
presented in this paper, would yield fully automated verification tools. We have 
also started building a Coq library for interactively proving HFLz formulas, 
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as briefly discussed in [23]. As a final remark, although one may fear that our 
reductions may map program verification problems to “harder” problems due 
to the expressive power of HFLz, it is actually not the case at least for the 
classes of problems in Sects. 5 and 6, which use the only alternation-free frag- 
ment of HFLz. The model checking problems for -only or v-only HFLz are 
semi-decidable and co-semi-decidable respectively, like the source verification 
problems of may/must-reachability and their negations of closed programs. 
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Abstract. Smart contracts are computer programs that are executed 
by a network of mutually distrusting agents, without the need of an 
external trusted authority. Smart contracts handle and transfer assets of 
considerable value (in the form of crypto-currency like Bitcoin). Hence, 
it is crucial that their implementation is bug-free. We identify the util- 
ity (or expected payoff) of interacting with such smart contracts as the 
basic and canonical quantitative property for such contracts. We present 
a framework for such quantitative analysis of smart contracts. Such a 
formal framework poses new and novel research challenges in program- 
ming languages, as it requires modeling of game-theoretic aspects to ana- 
lyze incentives for deviation from honest behavior and modeling utilities 
which are not specified as standard temporal properties such as safety 
and termination. While game-theoretic incentives have been analyzed in 
the security community, their analysis has been restricted to the very spe- 
cial case of stateless games. However, to analyze smart contracts, stateful 
analysis is required as it must account for the different program states 
of the protocol. Our main contributions are as follows: we present (i) a 
simplified programming language for smart contracts; (ii) an automatic 
translation of the programs to state-based games; (iii) an abstraction- 
refinement approach to solve such games; and (iv) experimental results 
on real-world-inspired smart contracts. 


1 Introduction 


In this work we present a quantitative stateful game-theoretic framework for 
formal analysis of smart-contracts. 


Smart Contracts. Hundreds of crypto-currencies are in use today, and invest- 
ments in them are increasing steadily [24]. These currencies are not controlled 
by any central authority like governments or banks, instead they are governed 
by the blockchain protocol, which dictates the rules and determines the out- 
comes, e.g., the validity of money transactions and account balances. Blockchain 
was initially used for peer-to-peer Bitcoin payments [43], but recently it is also 
used for running programs (called smart contracts). A smart contract is a pro- 
gram that runs on the blockchain, which enforces its correct execution (i.e., that 
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it is running as originally programmed). This is done by encoding semantics 
in crypto-currency transactions. For example, Bitcoin transaction scripts allow 
users to specify conditions, or contracts, which the transactions must satisfy 
prior to acceptance. Transaction scripts can encode many useful functions, such 
as validating that a payer owns a coin she is spending or enforcing rules for 
multi-party transactions. The Ethereum crypto-currency [16] allows arbitrary 
stateful Turing-complete conditions over the transactions which gives rise to 
smart contracts that can implement a wide range of applications, such as finan- 
cial instruments (e.g., financial derivatives or wills) or autonomous governance 
applications (e.g., voting systems). The protocols are globally specified and their 
implementation is decentralized. Therefore, there is no central authority and they 
are immutable. Hence, the economic consequences of bugs in a smart contract 
cannot be reverted. 


Types of Bugs. There are two types of bugs with monetary consequences: 


1. Coding errors. Similar to standard programs, bugs could arise from coding 
mistakes. At one reported case [33], mistakenly replacing += operation with 
=+ enabled loss of tokens that were backed by $800,000 of investment. 

2. Dishonest interaction incentives. Smart contracts do not fully dictate the 
behavior of participants. They only specify the outcome (e.g., penalty or 
rewards) of the behaviors. Hence, a second source for bugs is the high level 
interaction aspects that could give a participant unfair advantage and incen- 
tive for dishonest behavior. For example, a naive design of rock-paper-scissors 
game [29] allows playing sequentially, rather than concurrently, and gives 
advantage to the second player who can see the opponent’s move. 


DAO Attack: Interaction of Two Types of Bugs. Quite interestingly a coding 
bug can incentivize dishonest behavior as in the famous DAO attack [48]. The 
Decentralized Autonomous Organization (DAO) [38] is an Ethereum smart con- 
tract [51]. The contract consists of investor-directed venture capital fund. On 
June 17, 2016 an attacker exploited a bug in the contract to extract $80 mil- 
lion [48]. Intuitively, the root cause was that the contract allowed users to first get 
hold of their funds, and only then updated their balance records while a semantic 
detail allowed the attacker to withdraw multiple times before the update. 


Necessity of Formal Framework. Since bugs in smart contracts have direct eco- 
nomic consequences and are irreversible, they have the same status as safety- 
critical errors for programs and reactive systems and must be detected before 
deployment. Moreover, smart contracts are deployed rapidly. There are over a 
million smart contracts in Ethereum, holding over 15 billion dollars at the time 
of writing [31]. It is impossible for security researchers to analyze all of them, 
and lack of automated tools for programmers makes them error prone. Hence, a 
formal analysis framework for smart contract bugs is of great importance. 


Utility Analysis. In verification of programs, specifying objectives is non-trivial 
and a key goal is to consider specification-less verification, where basic proper- 
ties are considered canonical. For example, termination is a basic property in 
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program analysis; and data-race freedom or serializability are basic properties 
in concurrency. Given these properties, models are verified wrt them without 
considering any other specification. For smart contracts, describing the correct 
specification that prevents dishonest behavior is more challenging due to the 
presence of game-like interactions. We propose to consider the expected user 
utility (or payoff) that is guaranteed even in presence of adversarial behavior of 
other agents as a canonical property. Considering malicious adversaries is stan- 
dard in game theory. For example, the expected utility of a fair lottery is 0. An 
analysis reporting a different utility signifies a bug. 


New Research Challenges. Coding bugs are detected by classic verification, pro- 
gram analysis, and model checking tools [23,39]. However, a formal framework 
for incentivization bugs presents a new research challenge for the programming 
language community. Their analysis must overcome two obstacles: (a) the frame- 
work will have to handle game-theoretic aspects to model interactions and incen- 
tives for dishonest behavior; and (b) it will have to handle properties that cannot 
be deduced from standard temporal properties such as safety or termination, but 
require analysis of monetary gains (i.e., quantitative properties). 

While game-theoretic incentives are widely analyzed by the security commu- 
nity (e.g., see [13]), their analysis is typically restricted to the very special case 
of one-shot games that do not consider different states of the program, and thus 
the consequences of decisions on the next state of the program are ignored. In 
addition their analysis is typically ad-hoc and stems from brainstorming and 
special techniques. This could work when very few protocols existed (e.g., when 
bitcoin first emerged) and deep thought was put into making them elegant and 
analyzable. However, the fast deployment of smart contracts makes it crucial to 
automate the process and make it accessible to programmers. 


Our Contribution. In this work we present a formal framework for quantitative 
analysis of utilities in smart contracts. Our contributions are as follows: 


1. We present a simplified (loop-free) programming language that allows game- 
theoretic interactions. We show that many classical smart contracts can 
be easily described in our language, and conversely, a smart contract pro- 
grammed in our language can be easily translated to Solidity [30], which is 
the most popular Ethereum smart contract language. 

2. The underlying mathematical model for our language is stateful concurrent 
games. We automatically translate programs in our language to such games. 

3. The key challenge to analyze such game models automatically is to tackle the 
state-space explosion. While several abstraction techniques have been consid- 
ered for programs [14,35,45], they do not work for game-theoretic models with 
quantitative objectives. We present an approach based on interval-abstraction 
for reducing the states, establish soundness of our abstraction, and present a 
refinement process. This is our core technical contribution. 

4. We present experimental results on several classic real-world smart contracts. 
We show that our approach can handle contracts that otherwise give rise 
to games with up to 1023 states. While special cases of concurrent games 
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(namely, turn-based games) have been studied in verification and reactive 
synthesis, there are no practical methods to solve general concurrent quan- 
titative games. To the best of our knowledge, there are no tools to solve 
quantitative concurrent games other than academic examples of few states, 
and we present the first practical method to solve quantitative concurrent 
games that scales to real-world smart contract analysis. 


In summary, our contributions range from (i) modeling of smart contracts as 
state-based games, to (ii) an abstraction-refinement approach to solve such 
games, to (iii) experimental results on real-world smart contracts. 


2 Background on Ethereum Smart Contracts 


2.1 Programmable Smart Contracts 


Ethereum [16] is a decentralized virtual machine, which runs programs called 
contracts. Contracts are written in a Turing-complete bytecode language, called 
Ethereum Virtual Machine (EVM) bytecode [53]. A contract is invoked by call- 
ing one of its functions, where each function is defined by a sequence of instruc- 
tions. The contract maintains a persistent internal state and can receive (trans- 
fer) currency from (to) users and other contracts. Users send transactions to 
the Ethereum network to invoke functions. Each transaction may contain input 
parameters for the contract and an associated monetary amount, possibly 0, 
which is transferred from the user to the contract. 

Upon receiving a transaction, the contract collects the money sent to it, 
executes a function according to input parameters, and updates its internal state. 
All transactions are recorded on a decentralized ledger, called blockchain. A 
sequence of transactions that begins from the creation of the network uniquely 
determines the state of each contract and balances of users and contracts. The 
blockchain does not rely on a trusted central authority, rather, each transaction 
is processed by a large network of mutually untrusted peers called miners. Users 
constantly broadcast transactions to the network. Miners add transactions to 
the blockchain via a proof-of-work consensus protocol [43]. 


Subtleties. In this work, for simplicity, we ignore some details in the underlying 
protocol of Ethereum smart contract. We briefly describe these details below: 


— Transaction fees. In exchange for including her transactions in the blockchain, 
a user pays transaction fees to the miners, proportionally to the execution 
time of her transaction. This fact could slightly affect the monetary analysis 
of the user gain, but could also introduce bugs in a program, as there is a 
bound on execution time that cannot be exceeded. Hence, it is possible that 
some functions could never be called, or even worse, a user could actively 
give input parameters that would prevent other users from invoking a certain 
function. 
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— Recursive invocation of contracts. A contract function could invoke a function 
in another contract, which in turn can have a call to the original contract. 
The underling Ethereum semantic in recursive invocation was the root cause 
for the notorious DAO hack [27]. 

— Behavior of the miners. Previous works have suggested that smart contracts 
could be implemented to encourage miners to deviate from their honest behav- 
ior [50]. This could in theory introduce bugs into a contract, e.g., a contract 
might give unfair advantage for a user who is a big miner. 


2.2 Tokens and User Utility 


A user’s utility is determined by the Ether she spends and receives, but could 
also be affected by the state of the contract. Most notably, smart contracts 
are used to issue tokens, which can be viewed as a stake in a company or an 
organization, in return to an Ether (or tokens) investment (see an example in 
Fig. 1). These tokens are transferable among users and are traded in exchanges in 
return to Ether, Bitcoin and Fiat money. At the time of writing, smart contracts 
instantiate tokens worth billions of dollars [32]. Hence, gaining or losing tokens 
has clear utility for the user. At a larger scope, user utility could also be affected 
by more abstract storage changes. Some users would be willing to pay to have 
a contract declare them as Kings of Ether [4], while others could gain from 
registering their domain name in a smart contract storage [40]. In the examples 
provided in this work we mainly focus on utility that arises from Ether, tokens 
and the like. However, our approach is general and can model any form of utility 
by introducing auxiliary utility variables and definitions. 


lcontract Token { 
mapping (address=>uint) balances; 
function buy() payable { 
balances [msg. sender] += msg.value; 
} 
function transfer( address to, uint amount ) { 
if (balances [msg.sender]>=amount) { 
balances[msg.sender] -= amount; 
balances[to] += amount; 


COMANOUBRWD 


yi} 
Fig. 1. Token contract example. 


3 Programming Language for Smart Contracts 


In this section we present our programming language for smart contracts 
that supports concurrent interactions between parties. A party denotes an 
agent that decides to interact with the contract. A contract is a tuple C = 
(N,I,M,R,Xo0,F,T) where X := NUTU M is a set of variables, R describes 
the range of values that can be stored in each variable, Xo is the initial values 
stored in variables, F is a list of functions and T describes for each function, the 
time segment in which it can be invoked. We now formalize these concepts. 


Variables. There are three distinct and disjoint types of variables in X: 


744 K. Chatterjee et al. 


— N contains “numeric” variables that can store a single integer. 

— I contains “identification” (“id”) variables capable of pointing to a party in 
the contract by her address or storing NULL. The notion of ids is quite flexible 
in our approach: The only dependence on ids is that they should be distinct 
and an id should not act on behalf of another id. We simply use different 
integers to denote distinct ids and assume that a “faking of identity” does 
not happen. In Ethereum this is achieved by digital signatures. 

— M is the set of “mapping” variables. Each m € M maps parties to integers. 


Bounds and Initial Values. The tuple R = (R, R) where RR: NUM >Z 
represent lower and upper bounds for integer values that can be stored in a 
variable. For example, if n € N, then n can only store integers between R(n) 
and R(n). Similarly, if m € M is a mapping and i € I stores an address to 
a party in the contract, then m [i] can save integers between R(m) and R(m). 
The function Xo : X — ZU {NULL} assigns an initial value to every variable. 
The assigned value is an integer in case of numeric and mapping variables, i.e., 
a mapping variable maps everything to its initial value by default. Id variables 
can either be initialized by NULL or an id used by one of the parties. 


Functions and Timing. The sequence F =< fi, fo,..., fn > is a list of functions 
and T = (T,T), where T,T : F — N. The function f; can only be invoked in 
time-frame T(f;) = [Z(fi), T(fi)]. The contract uses a global clock, for example 
the current block number in the blockchain, to keep track of time. 

Note that we consider a single contract, and interaction between multiple 
contracts is a subject of future work. 


3.1 Syntax 


We provide a simple overview of our contract programming language. Our lan- 
guage is syntactically similar to Solidity [30], which is a widely used language 
for writing Ethereum contracts. A translation mechanism for different aspects is 
discussed in [19]. An example contract, modeling a game of rock-paper-scissors, 
is given in Fig. 2. Here, a party, called issuer has issued the contract and taken 
the role of Alice. Any other party can join the contract by registering as Bob 
and then playing rock-paper-scissors. To demonstrate our language, we use a 
bidding mechanism. 


Declaration of Variables. The program begins by declaring variables', their type, 
name, range and initial value. For example, Bids is a map variable that assigns 
a value between 0 and 100 to every id. This value is initially 0. Line numbers 
(labels) are defined in Sect. 3.2 below and are not part of the syntax. 


Declaration of Functions. After the variables, the functions are defined one-by- 
one. Each function begins with the keyword function followed by its name and 


1 For simplicity, we demonstrate our method with global variables only. However, the 
method is applicable to general variables as long as their ranges are well-defined at 
each point of the program. 
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(0) contract RPS { 
map Bids[0, 100] = 0; 
id Alice = issuer; 


i == i t= 
id Bob = null; (11) if (BobsMove==0 and AlicesMove!=0) 


numeric played[0,1] = 0; (12) fiacekon. = 13 
Te PaAI a ? (13) else if(AlicesMove==0 and BobsMove!=0) 
numeric AliceWon[0,1] = 0; 
numeric BobWon[0,1] = 0; (14) Bobwon * 
numerice bidlo, L001) = 0: (15) a if (AlicesMove==0 and BobsMove==0) 
numeric AlicesMove[0,3] = 0; (16) AliceWon = 0; 
numeric BobsMove[0,3] = 0; (17) BobWon = 0: 
//0 denotes no choice, } s : 


//1 rock, 2 paper, 


JUS weienone (18) else if (AlicesMove==BobsMove+1 or 


AlicesMove==BobsMove -2) 


(1) function registerBob[1,10] (19) jas sti 
(payable bid : caller) { = as 
(2) if(Bob==null) { an } SPBNOR, Soia 
(3) Bob = caller; 
(4) } Bids [Bob]=bid; (22) function getReward[16,20]() { 
aleet (23) if(caller==Alice and AliceWon==1 
(5) bayout (eave, bid) or caller==Bob and BobWon==1) 
, ; { 
(6) } + (24) payout (caller, 
(T) function play ltt, 181 Bids[Alice] + Bids[Bob]); 
(AlicesMove:Alice = 0, (25) Bids [Alice] = 05 
BobsMove:Bob = 0 (26) Bids [Bob] = 0; 
is Siar tn y } 
payable Bids[Alice]: Alice){ (27) } 


(8) if (played==1) 


(9) return; } 
else 
(10) played = 1; 


Fig. 2. A rock-paper-scissors contract. 


the time interval in which it can be called by parties. Then comes a list of input 
parameters. Each parameter is of the form variable : party which means 
that the designated party can choose a value for that variable. The chosen value 
is required to be in the range specified for that variable. The keyword caller 
denotes the party that has invoked this function and payable signifies that the 
party should not only decide a value, but must also pay the amount she decides. 
For example, registerBob can be called in any time between 1 and 10 by any of 
the parties. At each such invocation the party that has called this function must 
pay some amount which will be saved in the variable bid. After the decisions 
and payments are done, the contract proceeds with executing the function. 


Types of Functions. There are essentially two types of functions, depending on 
their parameters. One-party functions, such as registerBob and getReward 
require parameters from caller only, while multi-party functions, such as play 
ask several, potentially different, parties for input. In this case all parties provide 
their input decisions and payments concurrently and without being aware of the 
choices made by other parties, also a default value is specified for every decision 
in case a relevant party does not take part. 


Summary. Putting everything together, in the contract specified in Fig. 2, any 
party can claim the role of Bob between time 1 and time 10 by paying a bid 
to the contract, if the role is not already occupied. Then at time 11 one of the 
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parties calls play and both parties have until time 15 to decide which choice 
(rock, paper, scissors or none) they want to make. Then the winner can call 
getReward and collect her prize. 


3.2 Semantics 


In this section we present the details of the semantics. In our programming 
language there are several key aspects which are non-standard in programming 
languages, such as the notion of time progress, concurrency, and interactions of 
several parties. Hence we present a detailed description of the semantics. We 
start with the requirements. 


Requirements. In order for a contract to be considered valid, other than following 
the syntax rules, a few more requirements must be met, which are as follows: 


— We assume that no division by zero or similar undefined behavior happens. 

— To have a well-defined message passing, we also assume that no multi-party 
function has an associated time interval intersecting that of another function. 

— Finally, for each non-id variable v, it must hold that R(v) < Xo(v) < R(v) 
and similarly, for every function f;, we must have T(f;) < T(fi). 


Overview of Time Progress. Initially, the time is 0. Let F, be the set of functions 
executable at time t, i.e., F; = {fi € Flt € T(f;)}, then F; is either empty 
or contains one or more one-party functions or consists of a single multi-party 
function. We consider the following cases: 


— F; empty. If F; is empty, then nothing can happen until the clock ticks. 

— Execution of one-party functions. If F; contains one or more one-party func- 
tions, then each of the parties can call any subset of these functions at time 
t. If there are several calls at the same time, the contract might run them 
in any order. While a function call is being executed, all parties are able to 
see the full state of the contract, and can issue new calls. When there are 
no more requests for function calls, the clock ticks and the time is increased 
to t+ 1. When a call is being executed and is at the beginning part of the 
function, its caller can send messages or payments to the contract. Values of 
these messages and payments will then be saved in designated variables and 
the execution continues. If the caller fails to make a payment or specify a 
value for a decision variable or if her specified values/payments are not in the 
range of their corresponding variables, i.e. they are too small or too big, the 
call gets canceled and the contract reverts any changes to variables due to 
the call and continues as if this call had never happened. 

— Execution of multi-party functions. If F; contains a single multi-party function 
fi and t < T(f;), then any party can send messages and payments to the 
contract to specify values for variables that are designated to be paid or 
decided by her. These choices are hidden and cannot be observed by other 
participants. She can also change her decisions as many times as she sees fit. 
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The clock ticks when there are no more valid requests for setting a value for 
a variable or making a payment. This continues until we reach time T(f;). At 
this time parties can no longer change their choices and the choices become 
visible to everyone. The contract proceeds with execution of the function. If a 
party fails to make a payment /decision or if NULL is asked to make a payment 
or a decision, default behavior will be enforced. Default value for payments 
is 0 and default behavior for other variables is defined as part of the syntax. 
For example, in function play of Fig. 2, if a party does not choose, a default 
value of 0 is enforced and given the rest of this function, this will lead to a 
definite loss. 


Given the notion of time progress we proceed to formalize the notion of 
“runs” of the contract. This requires the notion of labels, control-flow graphs, 
valuations, and states, which we describe below. 


Labels. Starting from 0, we give the contract, beginning and end points of every 
function, and every command a label. The labels are given in order of appearance. 
As an example, see the labels in parentheses in Fig. 2. 


Entry and Exit Labels. We denote the first (beginning point) label in a function 
fi by O; and its last (end point) label by Mi. 


Control Flow Graphs (CFGs). We define the control flow graph CFG; of the 
function f; in the standard manner, i.e. CFG; = (V, E), where there is a vertex 
corresponding to every labeled entity inside f;. Each edge e € E has a condition 
cond(e) which is a boolean expression that must be true when traversing that 
edge. For more details see [19]. 


Valuations. A valuation is a function val, assigning a value to every variable. 
Values for numeric variables must be integers in their range, values for identity 
variables can be party ids or NULL and a value assigned to a map variable m must 
be a function val(m) such that for each identity i, we have R(m) < val(m)(i) < 
R(m). Given a valuation, we extend it to expressions containing mathematical 
operations in the straight-forward manner. 


States. A state of the contract is a tuple s = (t,b,l, val,c), where t is a time 
stamp, b € NU {0} is the current balance of the contract, i.e., the total amount 
of payment to the contract minus the total amount of payouts, lis a label (that 
is being executed), val assigns values to variables and c E€ PU{L}, is the caller of 
the current function. c = corresponds to the case where the caller is undefined, 
e.g., when no function is being executed. We use S' to denote the set of all states 
that can appear in a run of the contract as defined below. 


Runs. A run p of the contract is a finite sequence {p; = (tj, bj, lj, valj, ei) Fco 
of states, starting from (0,0,0, Xo, L), that follows all rules of the contract and 
ends in a state with time-stamp t, > maxs, T(f;). These rules must be followed 
when switching to a new state in a run: 


— The clock can only tick when there are no valid pending requests for running 
a one-party function or deciding or paying in multi-party functions. 
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— Transitions that happen when the contract is executing a function must follow 
its control flow graph and update the valuation correctly. 

— No variable can contain an out-of-bounds value. If an overflow or underflow 
happens, the closest possible value will be saved. This rule also ensures that 
the contract will not create new money, given that paying more than the 
current balance of the contract results in an underflow. 

— Each party can call any set of the functions at any time. 


Remark 1. Note that in our semantics each function body completes its execu- 
tion in a single tick of the clock. However, ticks might contain more than one 
function call and execution. 


Run Prefixes. We use H to mean the set of all prefixes of runs and denote the 
last state in 7 € H by end(n). A run prefix 7 is an extension of 7 if it can be 
obtained by adding one state to the end of 7. 


Probability Distributions. Given a finite set V, a probability distribution on ¥ 
is a function 6: ¥ — [0,1] such that X ey 6(x) = 1. Given such a distribution, 
its support, Supp(d), is the set of all x € ¥ such that d(2) > 0. We denote the 
set of all probability distributions on ¥ by A(&¥). 

Typically for programs it suffices to define runs for the semantics. However, 
given that there are several parties in contracts, their semantics depends on the 
possible choices of the parties. Hence we need to define policies for parties, and 
such policies will define probability distribution over runs, which constitute the 
semantics for contracts. To define policies we first define moves. 


Moves. We use M for the set of all moves. The moves that can be taken by 
parties in a contract can be summarized as follows: 


— Calling a function fi, we denote this by call( fi). 

— Making a payment whose amount, y is saved in x, we denote this by pay(z, y). 
— Deciding the value of x to be y, we denote this by decide(z, y). 

— Doing none of the above, we denote this by &. 


Permitted Moves. We define P; : S — M, so that P;(s) is the set of permitted 
moves for the party with identity i if the contract is in state s = (t, b, l, val, pj). 
It is formally defined as follows: 


— If fp is a function that can be called at state s, then call( fk) € Pi(s). 

— If 1 = O, is the first label of a function fq and x is a variable that can be 
decided by i at the beginning of the function f,, then decide(x, y) € P;(s) for 
all permissible values of y. Similarly if x can be paid by i, pay(a,y) € P;(s). 

—~We P,(s). 


Policies and Randomized Policies. A policy 7; for party i is a function m; : H > 
A, such that for every 7 € H, mi(n) € P;(end(n)). Intuitively, a policy is a way 
of deciding what move to use next, given the current run prefix. A policy profile 
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T = (m;) is a sequence assigning one policy to each party i. The policy profile m 
defines a unique run p” of the contract which is obtained when parties choose 
their moves according to m. A randomized policy €; for party i is a function 
&i : H > A(M), such that Supp(€i(s)) C Pi(s). A randomized policy assigns a 
probability distribution over all possible moves for party i given the current run 
prefix of the contract, then the party can follow it by choosing a move randomly 
according to the distribution. We use = to denote the set of all randomized policy 
profiles, =; for randomized policies of į and =_; to denote the set of randomized 
policy profiles for all parties except i. A randomized policy profile € is a sequence 
(€;) assigning one randomized policy to each party. Each such randomized policy 
profile induces a unique probability measure on the set of runs, which is denoted 
as Probé [-]. We denote the expectation measure associated to ProbS [-] by E£ [-]. 


3.3 Objective Function and Values of Contracts 


As mentioned in the introduction we identify expected payoff as the canonical 
property for contracts. The previous section defines expectation measure given 
randomized policies as the basic semantics. Given the expected payoff, we define 
values of contracts as the worst-case guaranteed payoff for a given party. We 
formalize the notion of objective function (the payoff function). 


Objective Function. An objective o for a party p is in one of the following forms: 


— (pt — p`), where p* is the total money received by party p from the contract 
(by “payout” statements) and p7 is the total money paid by p to the contract 
(as “payable” parameters). 

— An expression containing mathematical and logical operations (addition, mul- 
tiplication, subtraction, integer division, and, or, not) and variables chosen 
from the set NU {m [i] |m € M,i € I}. Here N is the set of numeric variables, 
m{i]’s are the values that can be saved inside maps.” 

— A sum of the previous two cases. 


Informally, p is trying to choose her moves so as to maximize o. 


Run Outcomes. Given a run p of the program and an objective o for party p, 
the outcome &(p,0,p) is the value of o computed using the valuation at end(p) 
for all variables and accounting for payments in p to compute p* and p7. 


Contract Values. Since we consider worst-case guaranteed payoff, we consider 
that there is an objective o for a single party p which she tries to maximize 
and all other parties are adversaries who aim to minimize o. Formally, given a 
contract C and an objective o for party p, we define the value of contract as: 


V(C,0,p):= sup inf E»'-») [K(p,0,p)] , 
Ep € Ep &-pES-p 


? We are also assuming, as in many programming languages, that TRUE = 1 and 
FALSE = 0. 
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This corresponds to p trying to maximize the expected value of o and all other 
parties maliciously colluding to minimize it. In other words, it provides the worst- 
case guarantee for party p, irrespective of the behavior of the other parties, which 
in the worst-case is adversarial to party p. 


3.4 Examples 


One contribution of our work is to present the simplified programming language, 
and to show that this simple language can express several classical smart con- 
tracts. To demonstrate the applicability, we present several examples of classical 
smart contracts in this section. In each example, we present a contract and a 
“buggy” implementation of the same contract that has a different value. In Sect. 6 
we show that our automated approach to analyze the contracts can compute con- 
tract values with enough precision to differentiate between the correct and the 
buggy implementation. All of our examples are motivated from well-known bugs 
that have happened in real life in Ethereum. 


Rock-Paper-Scissors. Let our contract be the one specified in Fig.2 and 
assume that we want to analyze it from the point of view of the issuer p. Also, 
let the objective function be (pt — p~ + 10 - AliceWon) . Intuitively, this means 
that winning the rock-paper-scissors game is considered to have an additional 
value of 10, other than the spending and earnings. The idea behind this is similar 
to the case with chess tournaments, in which players not only win a prize, but 
can also use their wins to achieve better “ratings”, so winning has extra utility. 

A common bug in writing rock-paper-scissors is allowing the parties to move 
sequentially, rather than concurrently [29]. If parties can move sequentially and 
the issuer moves after Bob, then she can ensure a utility of 10, i.e. her worst-case 
expected reward is 10. However, in the correct implementation as in Fig. 2, the 
best strategy for both players is to bid 0 and then Alice can win the game with 
probability 1/3 by choosing each of the three options with equal probability. 
Hence, her worst-case expected reward is 10/3. 


Auction. Consider an open auction, in which during a fixed time interval every- 
one is allowed to bid for the good being sold and everyone can see others’ bids. 
When the bidding period ends a winner emerges and every other participant can 
get their money back. Let the variable HighestBid store the value of the highest 
bid made at the auction. Then for a party p, one can define the objective as: 


pt — p` + (Winner==p) x HighestBid. 


This is of course assuming that the good being sold is worth precisely as much as 
the highest bid. A correctly written auction should return a value of 0 to every 
participant, because those who lose the auction must get their money back and 
the party that wins pays precisely the highest bid. The contract in Fig. 3 (left) 
is an implementation of such an auction. However, it has a slight problem. The 
function bid allows the winner to reduce her bid. This bug is fixed in the contract 
on the right. 


Quantitative Analysis of Smart Contracts 


contract BuggyAuction { 

map Bids[0,1000] = 0; 

numeric HighestBid[0,1000] = 0; 
id Winner = null; 

numeric bid[0,1000] = 0; 


function bid[1,10] 

(payable bid : caller) { 
payout(caller, Bids[caller]); 
Bids[caller]=bid; 
if (bid>HighestBid) 


contract Auction { 

map Bids[0,1000] = 0; 

numeric HighestBid[0,1000] = 0; 
id Winner = null; 

numeric bid[0,1000] = 0; 


function bid[1,10] 
(payable bid : caller) { 
if (bid<Bids[caller]) 
return; 
payout(caller, Bids[caller]); 
Bids[caller]=bid; 


{ if (bid>HighestBid) 
HighestBid = bid; { 
Winner = caller; HighestBid = bid; 
} Winner = caller; 
} } 


function withdraw [11,20] O 


} 
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{ function withdraw [11,20] () 
if (caller!=Winner) { 
4 if (caller !=Winner) 
payout(caller, Bids[caller]); { 
Bids [caller]=0; payout(caller, Bids[caller]); 
} Bids [caller]=0; 
}} } 
3} 


Fig. 3. A buggy auction contract (left) and its fixed version (right). 


Three-Way Lottery. Consider a three-party lottery contract issued by a party 
p. The other two players can sign up by buying tickets worth 1 unit each. Then 
each of the players is supposed to randomly and uniformly choose a nonce. A 
combination of these nonces produces the winner with equal probability for all 
three parties. If a person does not make a choice or pay the fees, she will cer- 
tainly lose the lottery. The rules are such that if the other two parties choose 
the same nonce, which is supposed to happen with probability z, then the issuer 
wins. Otherwise the winner is chosen according to the parity of sum of nonces. 
This gives everyone a winning probability of i if all sides play uniformly at ran- 
dom. However, even if one of the sides refuses to play uniformly at random, the 
resulting probabilities of winning stays the same because each side’s probability 
of winning is independent of her own choice assuming that others are playing 
randomly. We assume that the issuer p has objective pt — p~. This is because the 
winner can take other players’ money. In a bug-free contract we will expect the 
value of this objective to be 0, given that winning has a probability of i. How- 
ever, the bug here is due to the fact that other parties can collude. For example, 
the same person might register as both players and then opt for different nonces. 
This will ensure that the issuer loses. The bug can be solved by ensuring one’s 
probability of winning is i if she honestly plays uniformly at random, no matter 
what other parties do. For more details about this contract see [19]. 


Token Sale. Consider a contract that sells tokens modeling some aspect of 
the real world, e.g. shares in a company. At first anyone can buy tokens at a 
fixed price of 1 unit per token. However, there are a limited number of tokens 
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available and at most 1000 of them are meant to be sold. The tokens can then 
be transferred between parties, which is the subject of our next example. For 
now, Fig.4 (left) is an implementation of the selling phase. However, there is 
a big problem here. The problem is that one can buy any number of tokens as 
long as there is at least one token remaining. For example, one might first buy 
999 tokens and then buy another 1000. If we analyze the contract from the point 
of view of a solo party p with objective balance[p], then it must be capped by 
1000 in a bug-free contract, while the process described above leads to a value 
of 1999. The fixed contract is in Fig.4 (right). This bug is inspired by a very 
similar real-world bug described in [52]. 


Token Transfer. Consider the same bug-free token sale as in the previous 
example, we now add a function for transferring tokens. An owner can choose 
a recipient and an amount less than or equal to her balance and transfer that 
many tokens to the recipient. Figure 5 (left) is an implementation of this concept. 
Taking the same approach and objective as above, we expect a similar result. 
However, there is again an important bug in this code. What happens if a party 
transfers tokens to herself? She gets free extra tokens! This has been fixed in the 
contract on the right. This example models a real-world bug as in [42]. 


contract BuggySale { contract Sale { 
map balance[0,2000] = 0; map balance[0,2000] = 0; 
numeric remaining[0,2000] = 1000; numeric remaining[0,2000] = 1000; 
numeric payment [0,2000] = 0; numeric payment [0,2000] = 0; 
function buy[1,10] function buy [1,10] 
(payable payment:caller) (payable payment: caller) 
if (remaining <=0){ if (remaining - payment <0){ 
payout (caller, payment); payout(caller, payment); 
return; return; 
} } 
balance[caller] += payment; balance[caller] += payment; 
remaining -= payment; remaining -= payment; 
3} 3} 


Fig. 4. A buggy token sale (left) and its fixed version (right). 


Translation to Solidity. All aspects of our programming language are already 
present in Solidity, except for the global clock and concurrent interactions. The 
global clock can be modeled by the number of the current block in the blockchain 
and concurrent interactions can be implemented using commitment schemes. For 
more details see [19]. 


4 Bounded Analysis and Games 


Since smart contracts can be easily described in our programming language, 
and programs in our programming language can be translated to Solidity, the 
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contract Transfer { 
map balance[0,2000] = 0; 


contract BuggyTransfer { 
map balance[0,2000] = 0; 


numeric remaining[0,2000] = 1000; numeric remaining[0,2000] = 1000; 
numeric payment [0,2000] = 0; numeric payment [0,2000] = 0; 
numeric amount [0,2000] = 0; numeric amount[0,2000] = 0; 
numeric fromBalance[0,2000] = 0; 
numeric toBalance[0,2000] = 0; 
id recipient = null; id recipient = null; 
function buy[1,10]... function buy[1,10]... 
function transfer[1,10]( function transfer[1,10]( 
recipient : caller recipient : caller 
amount : caller) { amount caller) { 
fromBalance = balance[caller]; 
toBalance = balance[recipient]; 
if (fromBalance<amount ) if (balance [caller]<amount) 
return; return; 
fromBalance -= amount; balance[caller] -= amount; 
toBalance += amount; balance[recipient] += amount; 
balance[caller] = fromBalance; 
balance[recipient] = toBalance; 
3} }} 


Fig. 5. A buggy transfer function (left) and its fixed version (right). 


main aim to automatically compute values of contracts (i-e., compute guaranteed 
payoff for parties). In this section, we introduce the bounded analysis problem 
for our programming language framework, and present concurrent games which 
is the underlying mathematical framework for the bounded analysis problem. 


4.1 Bounded Analysis 


As is standard in verification, we consider the bounded analysis problem, where 
the number of parties and the number of function calls are bounded. In standard 
program analysis, bugs are often detected with a small number of processes, or a 
small number of context switches between concurrent threads. In the context of 
smart contracts, we analogously assume that the number of parties and function 
calls are bounded. 


Contracts with Bounded Number of Parties and Function Calls. Formally, a con- 
tract with bounded number of parties and function calls is as follows: 


— Let C be a contract and k € N, we define Ck as an equivalent contract that 
can have at most k parties. This is achieved by letting P = {p1, po,..-, px} 
be the set of all possible ids in the contract. The set P must contain all ids 
that are in the program source, therefore k is at least the number of such ids. 
Note that this does not restrict that ids are controlled by unique users, and 
a real-life user can have several different ids. We only restrict the analysis to 
bounded number of parties interacting with the smart contract. 

— To ensure runs are finite, number of function calls by each party is also 
bounded. Specifically, each party can call each function at most once dur- 
ing each time frame, i.e. between two consecutive ticks of the clock. This 
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closely resembles real-life contracts in which one’s ability to call many func- 
tions is limited by the capacity of a block in the blockchain, given that the 
block must save all messages. 


4.2 Concurrent Games 


The programming language framework we consider has interacting agents that 
act simultaneously, and we have the program state. We present the mathematical 
framework of concurrent games, which are games played on finite state spaces 
with concurrent interaction between the players. 


Concurrent Game Structures. A concurrent two-player game structure is a tuple 
G = (S, so, A, 11, 2,6), where S is a finite set of states, sọ € S is the start state, 
A is a finite set of actions, T4, T> : S > 24 \ Ø such that T; assigns to each state 
s € S, a non-empty set T;(s) C A of actions available to player i at s, and finally 
6:5x Ax A— S is a transition function that assigns to every state s € S and 
action pair a; € I\(s),a2 € I(s) a successor state 6(s,a1,d2) € S. 


Plays and Histories. The game starts at state so. At each state s; € S, player 
1 chooses an action af € I\(s;) and player 2 chooses an action afb € I(s;). 
The choices are made simultaneously and independently. The game subsequently 
transitions to the new state s;., = ô(s;, a1, a2) and the same process continues. 
This leads to an infinite sequence of tuples p = (siai, ab) o which is called 
a play of the game. We denote the set of all plays by #. Every finite prefix 
pl.-r] := ((so, a2, a), (s1, at, @4),..., (sr, a7, a3)) of a play is called a history and 
the set of all histories is denoted by #. If h = p|..r] is a history, we denote the 
last state appearing according to h, i.e. 5,4, = 6(s,,a7,@5), by last(h). We also 
define p|.. — 1] as the empty history. 


Strategies and Mixed Strategies. A strategy is a recipe that describes for a player 
the action to play given the current game history. Formally, a strategy ọ; for 
player i is a function g; : # — A, such that y;(h) € Ij(last(h)). A pair 
© = (91, #2) of strategies for the two players is called a strategy profile. Each such 
ọ induces a unique play. A mixed strategy o; : # — A(A) for player i given the 
history of the game. Intuitively, such a strategy suggests a distribution of actions 
to player i at each step and then she plays one of them randomly according to 
that distribution. Of course it must be the case that Supp(oi(h)) € T;(last(h)). 
A pair o = (01,02) of mixed strategies for the two players is called a mixed 
strategy profile. Note that mixed strategies generalize strategies with random- 
ization. Every mixed strategy profile o = (01,02) induces a unique probability 
measure on the set of plays, which is denoted as Prob?[-], and the associated 
expectation measure is denoted by E?[-]. 


State and History Utilities. In a game structure G, a state utility function u for 

player 1 is of the form u : S — R. Intuitively, this means that when the game 

enters state s, player 1 receives a reward of u(s). State utilities can be extended 

to history utilities. We define the utility of a history to be the sum of utilities 
a 


of all the states included in that history. Formally, if h = (s;, a}, a4) -o then 
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u(h) = $ ;—o u(si). Given a play p € Z, we denote the utility of its prefix of 
length L by u (p). 


Games. A game is a pair (G, u) where G is a game structure and u is a utility 
function for player 1. We assume that player 1 is trying to maximize u, while 
player 2’s goal is to minimize it. 


Values. The L-step finite-horizon value of a game (G, u) is defined as 


uL(G, u) := sup inf (71,02) [uL(p)], (1) 


a, 72 


where g; iterates over all possible mixed strategies of player i. This models the 
fact that player 1 is trying to maximize the utility in the first L steps of the run, 
while player 2 is minimizing it. The values of games can be computed using the 
value-iteration algorithm or dynamic programming, which is standard. A more 
detailed overview of the algorithms for games is provided in [19]. 


Remark 2. Note that in (1), limiting player 2 to pure strategies does not change 
the value of the game. Hence, we can assume that player 2 is an arbitrarily 
powerful nondeterministic adversary and get the exact same results. 


4.3 Translating Contracts to Games 


The translation from bounded smart contracts to games is straightforward, 
where the states of the concurrent game encodes the states of the contract. Cor- 
respondences between objects in the contract and game are as follows: (a) moves 
in contracts with actions in games; (b) run prefixes in contracts with histories 
in games; (c) runs in contracts with plays in games; and (d) policies (resp., ran- 
domized policies) in contracts with strategies (resp., mixed strategies) in games. 
Note that since all runs of the bounded contract are finite and have a limited 
length, we can apply finite horizon analysis to the resulting game, where L is the 
maximal length of a run in the contract. This gives us the following theorem: 


Theorem 1 (Correspondence). Given a bounded contract Cy for a party p 
with objective o, a concurrent game can be constructed such that value of this 
game, uL(G,u), is equal to the value of the bounded contract, V(C, 0, p). 


For details of the translation of smart contracts to games and proof of the 
theorem above see [19]. 


Remark 3. In standard programming languages, there are no parties to interact 
and hence the underlying mathematical models are graphs. In contrast, for smart 
contracts programming languages, where parties interact in a game-like manner, 
we have to consider games as the mathematical basis of our analysis. 
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5 Abstraction for Quantitative Concurrent Games 


Abstraction is a key technique to handle large-scale systems. In the previous 
section we described that smart contracts can be translated to games, but due 
to state-space explosion (since we allow integer variables), the resulting state 
space of the game is huge. Hence, we need techniques for abstraction, as well 
as refinement of abstraction, for concurrent games with quantitative utilities. In 
this section we present such abstraction refinement for quantitative concurrent 
games, which is our main technical contribution in this paper. We show the 
soundness of our approach and its completeness in the limit. Then, we introduce 
a specific method of abstraction, called interval abstraction, which we apply to 
the games obtained from contracts and show that soundness and refinement are 
inherited from the general case. We also provide a heuristic for faster refining of 
interval abstractions for games obtained from contracts. 


5.1 Abstraction for Quantitative Concurrent Games 


Abstraction considers a partition of the state space, and reduces the number of 
states by taking each partition set as a state. In case of transition systems (or 
graphs) the standard technique is to consider existential (or universal) abstrac- 
tion to define transitions between the partition sets. However, for game-theoretic 
interactions such abstraction ideas are not enough. We now describe the key 
intuition for abstraction in concurrent games with quantitative objectives and 
formalize it. We also provide a simple example for illustration. 


Abstraction Idea and Key Intuition. In an abstraction the state space of the 
game (G,u) is partitioned into several abstract states, where an abstract state 
represents a set of states of the original game. Intuitively, an abstract state 
represents a set of similar states of the original game. Given an abstraction our 
goal is to define two games that can provide lower and upper bound on the value 
of the original game. This leads to the concepts of lower and upper abstraction. 


— Lower abstraction. The lower abstraction (G!, u!) represents a lower bound on 
the value. Intuitively, the utility is assigned as minimal utility among states 
in the partition, and when an action profile can lead to different abstract 
states, then the adversary, i.e. player 2, chooses the transition. 

— Upper abstraction. The upper abstraction (G1, u?) represents an upper bound 
on the value. Intuitively, the utility is assigned as maximal utility among 
states in the partition, and when an action profile can lead to different 
abstract states, then player 1 is chooses between the possible states. 


Informally, the lower abstraction gives more power to the adversary, player 2, 
whereas the upper abstraction is favorable to player 1. 


General Abstraction for Concurrent Games. Given a game (G, u) consisting of a 
game structure G = (S, so, A, T1, 12,6) and a utility function u, and a partition 
TT of S, the lower and upper abstractions, (G! = (9°, s3, A®, TETE, ôl), ut) and 
(GT = (S°, sà, A*, T], T), ô), ul), of (G, u) with respect to TT are defined as: 
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— S = TTU D, where D = IT x Ax A is a set of dummy states for giving more 
power to one of the players. Members of S* are called abstracted states. 

— The start state of G is in the start state of G! and GĦ, i.e. so € sĝ € T. 

— A? = AUTI. Each action in abstracted games either corresponds to an action 
in the original game or to a choice of the next state. 

— If two states 51,52 € S, are in the same abstracted state s* € IT, then they 
must have the same set of available actions for both players, i.e. I\(s,) = 
I\(s2) and [2(s1) = I(s2). Moreover, s* inherits these action sets. Formally, 
P}(s*) = TÌ (s*) = Di (s1) = Fi(s2) and Fi} (s*) = T} (8°) = To(s1) = To(82). 

~ For all m € TI and a, € T}(r) and ag € I} (m), we have ôt(r, a1, a2) = 
(T,a1,a2) € D. Similarly for a, € TÌ(r) and az € I(r), 8 (7, a1, a2) = 
(T,a1,a2) € D. This means that all transitions from abstract states in TT go 
to the corresponding dummy abstract state in D. 

— If d = (7,a1,a2) € D is a dummy abstract state, then let Xa = {m € 
T | 3 ser ô(s,a1,a2) € T'} be the set of all partition sets that can be 
reached from r by a1, a2 in G. Then in G}, Ti (d) is a singleton, i.e., player 1 
has no choice, and I} (d) = Xa, i.e., player 2 can choose which abstract state 
is the next. Conversely, in G!, T 1 (d) is a singleton and player 2 has no choice, 
while T} (d) = X4 and player 1 chooses the next abstract state. 

— In line with the previous point, ôl (d, a1,a2) = a2 and ô’ (d, a1,a2) = a, for 
all d € D and available actions a, and ag. 

— We have ut (s?) = minses: {u(s)} and ul (s*) = maxsesa{u(s)}. The utility of 
a non-dummy abstracted state in G!, resp. G’, is the minimal, resp. maximal, 
utility among the normal states included in it. Also, for each dummy state 
d€ D, we have ut(d) = u? (d) = 0. 


Given a partition TT of S, either (i) there is no lower or upper abstraction cor- 
responding to it because it puts states with different sets of available actions 
together; or (ii) there is a unique lower and upper abstraction pair. Hence we 
will refer to the unique abstracted pair of games by specifying IT only. 


Remark 4. Dummy states are introduced for conceptual clarity in explaining the 
ideas because in lower abstraction all choices are assigned to player 2 and upper 
abstraction to player 1. However, in practice, there is no need to create them, 
as the choices can be allowed to the respective players in the predecessor state. 


Example. Figure 6 (left) shows a concurrent game with (G, u) with 4 states. The 
utilities are denoted in red. The edges correspond to transitions in ô and each 
edge is labeled with its corresponding action pair. Here A = {a,b}, I1(s0) = 
I2(s0) = Ia(s1) = Tı (s2) = I(s2) = I(s3) = A and I,(81) = I\(s3) = {a}. 
Given that action sets for sg and s2 are equal, we can create abstracted games 
using the partition TT = {79, 71, 72} where 7, = {sg, S2} and other sets are single- 
tons. The resulting game structure is depicted in Fig. 6 (center). Dummy states 
are shown by circles and whenever a play reaches a dummy state in G!, player 2 
chooses which red edge should be taken. Conversely, in G? player 1 makes this 
choice. Also, u (To) = max{u(so), u(s2)} = 10, ut (m0) = min{u(so), u(s2)} = 0 
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aa 


aa T™H={S1} 


a, (a, 14) 


a, (b, --) 
4 b, (b, 174) 


TI9={So, S2} a,b| | a,b 


ba ab bla ajb aa ab ajb 


ii K b, (b, 172) 


b, (a, --) a, (a, T2) 


aa 


m2={s3} 


Fig. 6. An example concurrent game (left), abstraction process (center) and the cor- 
responding G! without dummy states (right). 


and ul (m)u!(m) = u(si) = 10,u' (m2) = ul(m2) = u(s3) = 0. The final 
abstracted G! of the example above, without dummy states, is given in Fig. 6 
(right). 


5.2 Abstraction: Soundness, Refinement, and Completeness in 
Limit 

For an abstraction we need three key properties: (a) soundness, (b) refinement 
of the abstraction, and (c) completeness in the limit. The intuitive description 
is as follows: (a) soundeness requires that the value of the games is between 
the value of the lower and upper abstraction; (b) refinement requires that if 
the partition is refined, then the values of lower and upper abstraction becomes 
closer; and (c) completeness requires that if the partitions are refined enough, 
then the value of the original game can be approximated. We present each of 
these results below. 


Soundness. Soundness means that when we apply abstraction, value of the 
original game must lie between values of the lower and upper abstractions. Intu- 
itively, this means abstractions must provide us with some interval containing 
the value of the game. We expect the value of (G!,u!) to be less than or equal 
to the value of the original game because in (G+, ut), the utilities are less than in 
(G,u) and player 2 has more power, given that she can choose which transition 
to take. Conversely, we expect (G1, u?) to have a higher value than (G, u). 


Formal Requirement for Soundness. An abstraction of a game (G, u) leading to 
abstraction pair (G!,u!),(G!,u!) is sound if for every L, we have va (Gt, ut) < 
uL(G,u) < ve(G',u!). The factor 2 in the inequalities above is due to the 
fact that each transition in the original game is modeled by two transitions in 
abstracted games, one to a dummy state and a second one out of it. We now 
present our soundness result. 


Theorem 2 (Soundness, Proof in [19]). Given a game (G,u) and a partition 
TI of its state space, if G? and G? exist, then the abstraction is sound, i.e. for 
all L, it is the case that va (Gt, ul) < u (G, u) < va (G, ut). 
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Refinement. We say that a partition Tə is a refinement of a partition IT, 
and write I C Th, if every m € ITT, is a union of several 7;’s in Ma, ie. m = 
User am, and for all i € Z, m; € Tle. Intuitively, this means that TTo is obtained 
by further subdividing the partition sets in T4. It is easy to check that C is a 
partial order over partitions. We expect that if T C M4, then the abstracted 
games resulting from Iz give a better approximation of the value of the original 
game in comparison with abstracted games resulting from I1. This is called the 
refinement property. 


Formal Requirement for the Refinement Property. Two abstractions of a game 
(G, u) using a oe Tli, T2, such that TTz E M4, and leading to abstracted 
games (Glu 1); ( Glu 2) aone po n to each IT; satisfy the refinement oe 
if for every L, we have va.(G}, ul) < vaL(G$, ub) < var(Gh, ub) < va (G1, ul). 


Theorem 3 (Refinement Property, Proof in [19]). Let Mz E Mı be two 
partitions of the state space of a game (G, u), then the abstractions corresponding 
to Ti, M2 satisfy the refinement property. 


Completeness in the Limit. We say that an abstraction is complete in the 
limit, if by refining it enough the values of upper and lower abstractions get as 
close together as desired. Equivalently, this means that if we want to approximate 
the value of the original game within some predefined threshold of error, we can 
do so by repeatedly refining the abstraction. 


Formal Requirement for Completeness in the Limit. Given a game (G, u), a fixed 
finite-horizon L and an abstracted game pair corresponding to a partition T, 
the abstraction is said to be Coe in the limit, if for every e > 0 there exists 
To EM, such that if (Gd, ud), (Ch, u}) are the abstracted games corresponding 
to Tz, then vL (G}, ub) — uL (G}, ub) < €. 


Theorem 4 (Completeness in the Limit, Proof in [19]). Every abstraction 
on a game (Gu) using a partition TT is complete in the limit for all values of L. 


5.3 Interval Abstraction 


In this section, we turn our focus to games obtained from contracts and provide 
a specific method of abstraction that can be applied to them. 


Intuitive Overview. Let (G,u) be a concurrent game obtained from a contract as 
in the Sect. 4.3. Then the states of G, other than the unique dummy state, corre- 
spond to states of the contract Cp. Hence, they are of the form s = (t, b, l, val, p), 
where t is the time, b the contract balance, l is a label, p is the party calling 
the current function and val is a valuation. In an abstraction, one cannot put 
states with different times or labels or callers together, because they might have 
different moves and hence different action sets in the corresponding game. The 
main idea in interval abstraction is to break the states according to intervals 
over their balance and valuations. We can then refine the abstraction by making 
the intervals smaller. We now formalize this concept. 
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Objects. Given a contract Ck, let O be the set of all objects that can have an 
integral value in a state s of the contract. This consists of the contract balance, 
numeric variables and m[p]’s where m is a map variable and p is a party. More 
precisely, O = {8B} UNU{m|p]|m € M, p € P} where B denotes the balance. For 
an o € O, the value assigned to o at state s is denoted by os. 


Interval Partition. Let Cp be a contract and (Gu) its corresponding game. A 
partition IT of the state space of G is called an interval partition if: 


— The dummy state is put in a singleton set Ta. 

— Each 7 € IT except mq has associated values, tr, lr, Pr and for each o € O, 
07,0,, such that m = {s € S|s = (t,,b,l,, val, pr) and for allo € O, o, < 
So < 60,}. Basically, each partition set includes states with the same time, 
label and caller in which the value of every object o is in an interval [o,, 0z]. 


We call an abstraction using an interval partition, an interval abstraction. 


Refinement Heuristic. We can start with big intervals and continually break them 
into smaller ones to get refined abstractions and a finer approximation of the 
game value. We use the following heuristic to choose which intervals to break: 
Assume that the current abstracted pair of games are (G!,u!) and (GT, ul) 
corresponding to an interval partition Tl. Let d = (ma, a1, a2) be a dummy state 
in G? and define the skewness of d as v(Gl, ul)— v(Gi, ul). Intuitively, skewness 
of d is a measure of how different the outcomes of the games G! and G+ are, 
from the point when they have reached d. Take a label | with maximal average 
skewness among its corresponding dummy states and cut all non-unit intervals 
of it in more parts to get a new partition TT’. Continue the same process until 
the approximation is as precise as desired. Intuitively, it tries to refine parts of 
the abstraction that show the most disparity between Gt and G? with the aim 
to bring their values closer. Our experiments show its effectiveness. 


Soundness and Completeness in the Limit. If we restrict our attention to interval 
abstractions, soundness is inherited from general abstractions and completeness 
in the limit holds because TT, is an interval partition. Therefore, using interval 
abstractions is both sound and complete in the limit. 


Interval Refinement. An interval partition IT’ is interval refinement of a given 
interval partition TT if Tl’ E M. Refinement property is inherited from general 
abstractions. This intuitively means that IT’ is obtained by breaking the intervals 
in some sets of IT into smaller intervals. 


Conclusion. We devised a sound abstraction-refinement method for approximat- 
ing values of contracts. Our method is also complete in the limit. It begins 
by converting the contract to a game, then applies interval abstraction to the 
resulting game and repeatedly refines the abstraction using a heuristic until the 
desired precision is reached. 
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6 Experimental Results 


Implementation and Optimizations. The state-space of the games corre- 
sponding to the smart contracts is huge. Hence the original game corresponding 
to the contract is computationally too expensive to construct. Therefore, we 
do not first construct the game and then apply abstraction, instead we first 
apply the interval abstraction, and construct the lower and upper abstraction 
and compute values in them. We optimized our implementation by removing 
dummy states and exploiting acyclicity using backward-induction. More details 
are provided in [19]. 


Experimental Results. We present our experimental results (Table 1) for the 
five examples mentioned in Sect. 3.4. In each of the examples, the original game 
is quite large, and the size of the state space is calculated without creating them. 
In our experimental results we show the abstracted game size, the refinement of 
games to larger sizes, and how the lower and upper bound on the values change. 
We used an Ubuntu machine with 3.2 GHz Intel i7-5600U CPU and 12 GB RAM. 


Interpretation of the Experimental Results. Our results demonstrate the effec- 
tiveness of our approach in automatically approximating values of large games 
and real-world smart contracts. Concretely, the following points are shown: 


— Refinement Property. By repeatedly refining the abstractions, values of lower 
and upper abstractions get closer at the expense of a larger state space. 

— Distinguishing Correct and Buggy Programs. Values of the lower and upper 
abstractions provide an approximation interval containing the contract value. 
These intervals shrink with refinement until the intervals for correct and 
buggy programs become disjoint and distinguishable. 

— Bug Detection. One can anticipate a sensible value for the contract, and an 
approximation interval not containing the value shows a bug. For example, 
in token sale, the objective (number of tokens sold) is at most 1000, while 
results show the buggy program has a value between 1741 and 2000. 

— Quantification of Economic Consequences. Abstracted game values can also 
be seen as a method to quantify and find limits to the economic gain or loss 
of a party. For example, our results show that if the buggy auction contract 
is deployed, a party can potentially gain no more than 1000 units from it. 


7 Comparison with Related Work 


Blockchain Security Analysis. The first security analysis of Bitcoin protocol 
was done by Nakamoto [43] who showed resilience of the blockchain against 
double-spending. A stateful analysis was done by Sapirshtein et al. [47] and by 
Sompolinsky and Zohar [49] in which states of the blockchain were considered. 
It was done using MDPs where only the attacker decides on her actions and the 
victim follows a predefined protocol. Our paper is the first work that is using 
two-player and concurrent games to analyze contracts and the first to use stateful 
analysis on arbitrary smart contracts, rather than a specific protocol. 
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Table 1. Experimental results for correct and buggy contracts. | := v(G!, ut) denotes 
the lower value and u := v(G',u') is the upper value. Times are in seconds. 


Rock-Paper-Scissors 


Size Abstractions 
Correct Program Buggy Variant 
states ||  , u] time| states |l uj] time 


> 2.5 - 10'4/ 19440 |[0.00 , 10.00] 367 | 25200 |[0.00 , 10.00]| 402 
135945 He » 6.10]|2644) 99345 [8.01 , 10.00]|4815 


252450]({1.83 , 5.59]/3381 
Auction 
Size Abstractions 
Correct Program Buggy Variant 
states || ,  uj|time| states || ,  ujļ|time 


> 5.2 - 104| 3360 [0 , 1000] 68 | 2880 [0 , 1000]| 38 
22560 [0, 282]| 406 | 27360 |565 , 1000]| 552 
| ] 


272160)[0 , 227]|4237|233280|[748 , 1000]|3780 
Lottery 
Size Abstractions 
Correct Program Buggy Variant 
states | , uļ| time | states || , ul] time 


> 2.5. 108| 1539 |[-1, 1] 17 | 1701 |[-1, 1| 22 
2457600|[0 , 0]/13839/2457600|[—1 , —1]|13244 


Sale 
Size Abstractions 
Correct Program Buggy Variant 
states || ,  ujļftime| states |[l ,  uj|time 


> 4.6 - 10??| 17010 |0 ,2000|| 226 170100 , 2000]| 275 
75762 |[723 , 1472]|1241| 81202 |[1167 , 2000]/1733 
131250][792 , 1260]|2872|124178|[1741 , 2000]/2818 


Transfer 
Size Abstractions 
Correct Program Buggy Variant 
states || ,  uJj/time| states |[/ ,  uj/time 
> 10° 1040 |[0 , 2000]/ 20 | 6561 |[0  ,2000]| 237 


| 
32880 [844 , 1793]| 562 
A831T [903-1350] 3740| 131520 [1716 , 2000])3979 
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Smart Contract Security. Delmolino et al. [29] held a contract programming 
workshop and showed that even simple contracts can contain incentive misalign- 
ment bugs. Luu et al. [41] introduced a symbolic model checker with which 
they could detect specific erroneous patterns. However the use of model checker 
cannot be extended to game-theoretic analysis. Bhargavan et al. [9] translated 
solidity programs to F* and then used standard verification tools to detect vul- 
nerable code patterns. See [7] for a survey of the known causes for Solidity bugs 
that result in security vulnerabilities. 


Games and Verification. Abstraction for concurrent games has been considered 
wrt qualitative temporal objectives [3,22,28,44]. Several works considered con- 
current games with only pure strategies [28,36,37]. Concurrent games with pure 
strategies are extremely restrictive and effectively similar to turn-based games. 
The min-max theorem (determinacy) does not hold for them even in special 
cases of one-shot games or games with qualitative objectives. 

Quantitative analysis with games is studied in [12,17,21]. However these 
approaches either consider games without concurrent interactions or do not con- 
sider any abstraction-refinement. A quantitative abstraction-refinement frame- 
work has been considered in [18]; however, there is no game-theoretic interac- 
tion. Abstraction-refinement for games has also been considered [20,36]; however, 
these works neither consider games with concurrent interaction, nor quantitative 
objectives. Moreover, [20,36] start with a finite-state model without variables, 
and interval abstraction is not applicable to these game-theoretic frameworks. 
In contrast, our technical contribution is an abstraction-refinement approach for 
quantitative games and its application to analysis of smart contracts. 


Formal Methods in Security. There is a huge body of work on program anal- 
ysis for security; see [1,46] for survey. Formal methods are used to create safe 
programming languages (e.g., [34,46]) and to define new logics that can express 
security properties (e.g., [5,6,15]). They are also used to automatically verify 
security and cryptographic protocols, e.g., [2,8,11] for a survey. However, all of 
these works aimed to formalize qualitative properties such as privacy violation 
and information leakage. To the best of our knowledge, our framework is the first 
attempt to use formal methods as a tool for reasoning about monetary loses and 
identifying them as security errors. 


Bounded Model Checking (BMC). BMC was proposed by Biere et al. in 1999 [10]. 
The idea in BMC is to search for a counterexample in executions whose length 
is at most k. If no bug is found then one increases k until either a bug is found, 
the problem becomes intractable, or some pre-known upper bound is reached. 


Interval Abstraction. The first infinite abstract domain was introduced in [25]. 
This was later used to prove that infinite abstract domains can lead to effective 
static analysis for a given programming language [26]. However, none of the 
standard techniques is applicable to game analysis. 
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8 Conclusion 


In this work we present a programming language for smart contracts, and an 
abstraction-refinement approach for quantitative concurrent games to automat- 
ically analyze (i.e., compute worst-case guaranteed utilities of) such contracts. 
This is the first time a quantitative stateful game-theoretic framework is studied 
for formal analysis of smart contracts. There are several interesting directions 
of future work. First, we present interval-based abstraction techniques for such 
games, and whether different abstraction techniques can lead to more scalability 
or other classes of contracts is an interesting direction of future work. Second, 
since we consider worst-case guarantees, the games we obtain are two-player 
zero-sum games. The extension to study multiplayer games and compute values 
for rational agents is another interesting direction of future work. Finally, in this 
work we do not consider interaction between smart contracts, and an extension 
to encompass such study will be a subject of its own. 
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Abstract. In sequential languages, dynamic contracts are usually 
expressed as boolean functions without externally observable effects, 
written within the language. We propose an analogous notion of concur- 
rent contracts for languages with session-typed message-passing concur- 
rency. Concurrent contracts are partial identity processes that monitor 
the bidirectional communication along channels and raise an alarm if a 
contract is violated. Concurrent contracts are session-typed in the usual 
way and must also satisfy a transparency requirement, which guarantees 
that terminating compliant programs with and without the contracts are 
observationally equivalent. We illustrate concurrent contracts with sev- 
eral examples. We also show how to generate contracts from a refinement 
session-type system and show that the resulting monitors are redundant 
for programs that are well-typed. 


Keywords: Contracts - Session types - Monitors 


1 Introduction 


Contracts, specifying the conditions under which software components can safely 
interact, have been used for ensuring key properties of programs for decades. 
Recently, contracts for distributed processes have been studied in the context of 
session types [15,17]. These contracts can enforce the communication protocols, 
specified as session types, between processes. In this setting, we can assign each 
channel a monitor for detecting whether messages observed along the channel 
adhere to the prescribed session type. The monitor can then detect any deviant 
behavior the processes exhibit and trigger alarms. However, contracts based 
solely on session types are inherently limited in their expressive power. Many 
contracts that we would like to enforce cannot even be stated using session 
types alone. As a simple example, consider a “factorization service” which may 
be sent a (possibly large) integer z and is supposed to respond with a list of 
prime factors. Session types can only express that the request is an integer and 
the response is a list of integers, which is insufficient. 

In this paper, we show that by generalizing the class of monitors beyond 
those derived from session types, we can enforce, for example, that multiplying 
the numbers in the response yields the original integer x. This paper focuses on 
monitoring more expressive contracts, specifically those that cannot be expressed 
with session types, or even refinement types. 
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To handle these contracts, we have designed a model where our monitors exe- 
cute as transparent processes alongside the computation. They are able to main- 
tain internal state which allows us to check complex properties. These monitoring 
processes act as partial identities, which do not affect the computation except 
possibly raising an alarm, and merely observe the messages flowing through the 
system. They then perform whatever computation is needed, for example, they 
can compute the product of the factors, to determine whether the messages 
are consistent with the contract. If the message is not consistent, they stop the 
computation and blame the process responsible for the mistake. To show that 
our contracts subsume refinement-based contracts, we encode refinement types 
in our model by translating refinements into monitors. This encoding is useful 
because we can show a blame (safety) theorem stating that monitors that enforce 
a less precise refinement type than the type of the process being monitored will 
not raise alarms. Unfortunately, the blame theory for the general model is chal- 
lenging because the contracts cannot be expressed as types. 

The main contributions of this paper are: 


A novel approach to contract checking via partial-identity monitors 

A method for verifying that monitors are partial identities, and a proof that 
the method is correct 

— Examples showing the breadth of contracts that our monitors can enforce 

— A translation from refinement types to our monitoring processes and a blame 
theorem for this fragment 


The rest of this paper is organized as follows. We first review the background 
on session types in Sect. 2. Next, we show a range of example contracts in Sect. 3. 
In Sect. 4, we show how to check that a monitor process is a partial identity and 
prove the method correct. We then show how we can encode refinements in our 
system in Sect. 5. We discuss related work in Sect. 6. Due to space constraints, we 
only present the key theorems. Detailed proofs can be found in our companion 
technical report [12]. 


2 Session Types 


Session types prescribe the communication behavior of message-passing concur- 
rent processes. We approach them here via their foundation in intuitionistic 
linear logic [4,5,22]. The key idea is that an intuitionistic linear sequent 


Ay i An FC 


is interpreted as the interface to a process expression P. We label each of the 
antecedents with a channel name a; and the succedent with a channel name c. 
The a; are the channels used and c is the channel provided by P. 


a1: Aj,...,@n:Anb P: (c: C) 


We abbreviate the antecedents by A. All the channels a; and c must be dis- 
tinct, and bound variables may be silently renamed to preserve this invariant in 
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the rules. Furthermore, the antecedents are considered modulo exchange. Cut 
corresponds to parallel composition of two processes that communicate along a 
private channel x, where P is the provider along x and Q the client. 
AF P::(@: A) @:A,A’FQ:(e:C) 
AAH Hz:A P; Q: (e: C) 


cut 


Operationally, the process x — P ; Q spawns P as a new process and continues 
as Q, where P and Q communicate along a fresh channel a, which is substituted 
for x. We sometimes omit the type A of x in the syntax when it is not relevant. 

In order to define the operational semantics rigorously, we use multiset rewrit- 
ing [6]. The configuration of executing processes is described as a collection C of 
propositions proc(c, P) (process P is executing, providing along c) and msg(c, M) 
(message M is sent along c). All the channels c provided by processes and mes- 
sages in a configuration must be distinct. 

A cut spawns a new process, and is in fact the only way new processes are 
spawned. We describe a transition C —> C’ by defining how a subset of C can 
be rewritten to a subset of C’, possibly with a freshness condition that applies 
to all of C in order to guarantee the uniqueness of each channel provided. 


proc(c,z:A — P ; Q) — proc(a, [a/x]P), proc(c,[a/z]Q) (a fresh) 


Each of the connectives of linear logic then describes a particular kind of com- 
munication behavior which we capture in similar rules. Before we move on to 
that, we consider the identity rule, in logical form and operationally. 


Ara id b:AFa<b:: (a: A) W proc(a,a — b), C — [b/a]C 
Operationally, it corresponds to identifying the channels a and b, which we imple- 
ment by substituting b for a in the remainder C of the configuration (which we 
make explicit in this rule). The process offering a terminates. We refer to a — b 
as forwarding since any messages along a are instead “forwarded” to b. 

We consider each class of session type constructors, describing their process 
expression, typing, and asynchronous operational semantics. The linear logical 
semantics can be recovered by ignoring the process expressions and channels. 


Internal and External Choice. Even though we distinguish a provider and its 
client, this distinction is orthogonal to the direction of communication: both may 
either send or receive along a common private channel. Session typing guarantees 
that both sides will always agree on the direction and kind of message that is 
sent or received, so our situation corresponds to so-called binary session types. 

First, the internal choice c : A ® B requires the provider to send a token 
inl or inr along c and continue as prescribed by type A or B, respectively. For 
practical programming, it is more convenient to support n-ary labelled choice 
{L : Aeee where L is a set of labels. A process providing c : {L : Agheer 
sends a label k € L along c and continues with type Ap. The client will operate 
dually, branching on a label received along c. 
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keL AFP: (c: Ak) E A, c: Act Qe: (d: D) for every LE L gi 
T 
Akck;P:(c: @{€: Abeer) ` A,c: Q{L: Ackeer F case c (L > Qe)cer :: (d: D) 


The operational semantics is somewhat tricky, because we communicate asyn- 
chronously. We need to spawn a message carrying the label Z, but we also need 
to make sure that the next message sent along the same channel does not over- 
take the first (which would violate session fidelity). Sending a message therefore 
creates a fresh continuation channel c’ for further communication, which we sub- 
stitute in the continuation of the process. Moreover, the recipient also switches 
to this continuation channel after the message is received. 


proc(c,c.k ; P) — proc(c’, [c’/c]P), msg(c, c.k ; c cc) (e fresh) 
msg(c,c.k ; c — c’), proc(d, case c (€ > Qe)eer) — proc(d, [c’/c]Qx) 


It is interesting that the message along c, followed by its continuation c’ can be 
expressed as a well-typed process expression using forwarding c.k ; c — c’. This 
pattern will work for all other pairs of send/receive operations. 

External choice reverses the roles of client and provider, both in the typing 
and the operational rules. Below are the semantics and the typing is in Fig. 6. 


proc(d,c.k ; Q) — msg(c’,c.k ; d — c), proc(d,[c’/c]Q) (œ fresh) 
proc(c, case c (L => Pe)cer), msg(c', c.k ; d — c) — proc(c’, [c/c] Pk) 


Sending and Receiving Channels. Session types are higher-order in the 
sense that we can send and receive channels along channels. Sending a channel 
is perhaps less intuitive from the logical point of view, so we show that and just 
summarize the rules for receiving. 

If we provide c: A & B, we send a channel a: A along c and continue as B. 
From the typing perspective, it is a restricted form of the usual two-premise &R 
rule by requiring the first premise to be an identity. This restriction separates 
spawning of new processes from the sending of channels. 

A-EP:B A,x:A,ce:BEQ::(d:D) 


R* L 
A,a:Atsendca;P::(c: AQB) A,c:A@Braereve;Q:: (d: D) 3 


The operational rules follow the same patterns as the previous case. 


proc(c, send ca; P) — proc(c', [c /c] P), msg(send ca ; c c') (c fresh) 
msg(c, send c a ; c+), proc(d, x — recv c ; Q) — proc(d, [c’/c][a/z]Q) 


Receiving a channel (written as a linear implication A —o B) works symmet- 
rically. Below are the semantics and the typing is shown in Fig. 6. 


proc(d, send ca; Q) — msg(c',send ca ; c’ — c), proc(d, [d /c]Q) (c fresh) 
proc(c, x + recv c; P), msg(c',send ca; d — c) — proc(c', [c’/c][a/x]P) 


Termination. We have already seen that a process can terminate by forwarding. 

Communication along a channel ends explicitly when it has type 1 (the unit of 

&) and is closed. By linearity there must be no antecedents in the right rule. 
AF Q: (d: D) 


-F close c :: (c: 1) En A,c: 1 wait c; Q: (d: D) aa 
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Since there cannot be any continuation, the message takes a simple form. 


proc(c, close c) —> msg(c, close c) 
msg(c, close c), proc(d, wait c ; Q) — proc(d, Q) 


Quantification. First-order quantification over elements of domains such as 
integers, strings, or booleans allows ordinary basic data values to be sent and 
received. At the moment, since we have no type families indexed by values, the 
quantified variables cannot actually appear in their scope. This will change in 
Sect.5 so we anticipate this in these rules. 

The proof of an existential quantifier contains a witness term, whose value 
is what is sent. In order to track variables ranging over values, a new context 
W is added to all judgments and the preceding rules are modified accordingly. 
All value variables n declared in context Y must be distinct. Such variables are 
not linear, but can be arbitrarily reused, and are therefore propagated to all 
premises in all rules. We write W F v : T to check that value v has type 7 in 
context Y. 


Weov:t P; AFP: (c: [v/njA) P n:T; Â ci AFQ: (d: D) 
W; AF send cv; P:: (e: dn:t. A) Y; Ac: Inir. AF ne reve; Q:: (d: D) 


JL 


proc(c,send cv ; P) — proc(c', [c/c] P), msg(c, send c v ; c— c’) 
msg(c, send c v ; c + ¢), proc(d, n + recv c ; Q) — proc(d, [c’/c][v/n]Q) 
The situation for universal quantification is symmetric. The semantics are given 


below and the typing is shown in Fig. 6. 


/ 


proc(d, send c v ; Q) — msg(c’,send cv; d — c), proc(d, [c’/c]Q) 
proc(c, x — recv c; P), msg(c', send cv ; c — c) — proc(c', [c’/c][v/n]P) 


Processes may also make internal transitions while computing ordinary values, 
which we don’t fully specify here. Such a transition would have the form 


proc(c, P[e]) — proc(c, Ple’]) if ere’ 
where Pe] would denote a process with an ordinary value expression in evalua- 
tion position and e+ e’ would represent a step of computation. 


Shifts. For the purpose of monitoring, it is important to track the direction of 
communication. To make this explicit, we polarize the syntax and use shifts to 
change the direction of communication (for more detail, see prior work [18]). 


Negative types A7, BT  ::= &{€: A; yeer | At — B- | Vnitr. A7 | TAT 
Positive types At, B+ = @{€: Af ker | At @ B+ | 1 | Ani. At | LAT 
Types A, B,C, D ::= A` | At 


From the perspective of the provider, all negative types receive and all posi- 
tive types send. It is then clear that TA must receive a shift message and then 
start sending, while |A must send a shift message and then start receiving. 
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For this restricted form of shift, the logical rules are otherwise uninformative. 
The semantics are given below and the typing is shown in Fig. 6. 


c, send c shift ; P) —> proc(c’, [c’/c|P), msg(c, send c shift; cc’) (c fresh) 
c, send c shift ; c — c’), proc(d, shift — recv d ; Q) — proc(d, [c’/c]Q) 

d, send d shift ; Q) — msg(c’, send c shift ; c’ — c), proc(d, [c’/c]Q) 

c, shift — recv c ; P), msg(c’, send c shift ; c — c) — proc(c’, [c’/c]P) 


proc 
msg 
proc 
proc 


NN ON es 


Recursive Types. Practical programming with session types requires them to 
be recursive, and processes using them also must allow recursion. For example, 
lists with elements of type int can be defined as the purely positive type list’. 


list? = @{ cons: Sn:int. list” ; nil: 1 } 


A provider of type c : list is required to send a sequence such as cons-v1-cons-U2 --- 
where each v; is an integer. If it is finite, it must be terminated with nil- end. In 
the form of a grammer, we could write 


From ::= cons: v - From | nil - end 


A second example is a multiset (bag) of integers, where the interface allows 
inserting and removing elements, and testing if it is empty. If the bag is empty 
when tested, the provider terminates after responding with the empty label. 


bag” = &{ insert : Vn:int. bag”, remove : Vn:int. bag`, 
is_empty : | @{empty : 1, nonempty : | bag” } } 


The protocol now describes the following grammar of exchanged messages, where 
To goes to the provider, From comes from the provider, and v stands for integers. 


To  ::= insert - v - To | remove: v - To | is-empty - shift - From 
From ::= empty - end | nonempty - shift - To 


For these protocols to be realized in this form and support rich subtyping and 
refinement types without change of protocol, it is convenient for recursive types 
to be equirecursive. This means a defined type such as list™ is viewed as equal 
to its definition @{...} rather than isomorphic. For this view to be consistent, 
we require type definitions to be contractive [11], that is, they need to provide 
at least one send or receive interaction before recursing. 

The most popular formalization of equirecursive types is to introduce an 
explicit -constructor. For example, list = pa. ${ cons : dn:int.a,nil: 1 } with 
rules unrolling the type pa. A to [(ua. A)/a]A. An alternative (see, for example, 
Balzers and Pfenning 2017 [3]) is to use an explicit definition just as we stated, 
for example, list and bag, and consider the left-hand side equal to the right-hand 
side in our discourse. In typing, this works without a hitch. When we consider 
subtyping explicitly, we need to make sure we view inference systems on types as 
being defined co-inductively. Since a co-inductively defined judgment essentially 
expresses the absence of a counterexample, this is exactly what we need for 
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the operational properties like progress, preservation, or absence of blame. We 
therefore adopt this view. 


Recursive Processes. In addition to recursively defined types, we also need 
recursively defined processes. We follow the general approach of Toninho 
et al. [23] for the integration of a (functional) data layer into session-typed 
communication. A process can be named p, ascribed a type, and be defined as 


follows. 
p:Vny:ty. ..., VNk:Tk {A — Ay,...,Am} 
L— pny... Nk S Yi- -3 Ym =P 


where we check (Nn1:T1,..-,Nk:Tk) ; (Y1:41, -< -Ym Ám) F P :: (x: A) 
We use such process definitions when spawning a new process with the syntax 


CH pe ...,e€k — dy,...,dm;P 


which we check with the rule 
(Wh ei : Tiic{1,...,k} A’ = (di:A1,...,dm:Am) W;A,c: AF Q: (d: D) 
Y; A,A em pe...en | di,...,dm; Q: (d: D) 


pdef 


After evaluating the value arguments, the call consumes the channels d; (which 
will not be available to the continuation Q, due to linearity). The continuation 
Q will then be the (sole) client of c and The new process providing c will execute 
[e/2] [ds /y1] -- [dm/Ym]P. E 

One more quick shorthand used in the examples: a tail-call c — p € + d in 
the definition of a process that provides along c is expanded into d — pē + d; 
c c for a fresh c’. Depending on how forwarding is implemented, however, it 
may be much more efficient [13]. 


Stopping Computation. Finally, in order to be able to successfully monitor 
computation, we need the capability to stop the computation. We add an abort | 
construct that aborts on a particular label. We also add assert blocks to check 
conditions on observable values. The semantics are given below and the typing 
is in Fig. 6. 


proc(c, assert | True; Q) — proc(c, Q) proc(c, assert | False; Q) — abort(l) 


Progress and preservation were proven for the above system, with the exception 
of the abort and assert rules, in prior work [18]. The additional proof cases do 
not change the proof significantly. 


3 Contract Examples 


In this section, we present monitoring processes that can enforce a variety of 
contracts. The examples will mainly use lists as defined in the previous section. 
Our monitors are transparent, that is, they do not change the computation. 
We accomplish this by making them act as partial identities (described in more 
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detail in Sect. 4). Therefore, any monitor that enforces a contract on a list must 
peel off each layer of the type one step at a time (by sending or receiving over 
the channel as dictated by the type), perform the required checks on values or 
labels, and then reconstruct the original type (again, by sending or receiving as 
appropriate). 


Refinement. The simplest kind of monitoring process we can write is one that 
models a refinement of an integer type; for example, a process that checks 
whether every element in the list is positive. This is a recursive process that 
receives the head of the list from channel b, checks whether it is positive (if yes, 
it continues to the next value, if not it aborts), and then sends the value along to 
reconstruct the monitored list a. We show three refinement monitors in Fig. 1. 
The process pos implements the refinement mentioned above. 


poms ahe Tire) nempty : {list — list} 


a< pos mon + b= empty : {list — list} ae nempty este 
case b of a< empty — b = 
| nil = a.nil ; wait b ; close a case b of hia pof 7 
| cons > x + recvb; | nil > wait b; [nil abort . 
assert(x > 0)? ; a.nil ; close a | Sens eel i 
a.cons ; send a X ; | cons = abort”; ; ores 


sendax;a+—b;; 
a pos_mon + b;; 


Fig. 1. Refinement examples 


Our monitors can also exploit information that is contained in the labels in 
the external and internal choices. The empty process checks whether the list b is 
empty and aborts if b sends the label cons. Similarly, the nempty monitor checks 
whether the list b is not empty and aborts if b sends the label nil. These two 
monitors can then be used by a process that zips two lists and aborts if they 
are of different lengths. These two monitors enforce the refinements {nil} C 
{nil,cons} and {cons} C {nil,cons}. We discuss how to generate monitors 
from refinement types in more detail in Sect. 5. 


Monitors with Internal State. We now move beyond refinement contracts, 
and model contracts that have to maintain some internal state (Fig. 2). 

We first present a monitor that checks whether the given list is sorted in 
ascending order (ascending). The monitor’s state consists of a lower bound on 
the subsequent elements in the list. This value has an option type, which can 
either be None if no bound has yet been set, or Some b if b is the current bound. 

If the list is empty, there is no bound to check, so no contract failure can 
happen. If the list is nonempty, we check to see if a bound has already been set. 
If not, we set the bound to be the first received element. If there is already a 
bound in place, then we check if the received element is greater or equal to the 
bound. If it is not, then the list must be unsorted, so we abort with a contract 
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match : int — {list — list};; 


ascending : option int — {list —list};; a< match count +— b = 
m — ascending bound — n = case b of 
case n of | nil > assert (count = 0)? ; 
| nil = m.nil ; wait n ; close m a.nil ; wait b ; close a 
| cons > x + recvn; | cons = a.cons ; x + recv b ; 
case bound of if (x = 1) then send a x ; 
| None = m.cons ; send m x ; a+ match (count + 1) — b; 
m <— ascending (Some x) — n else if (x = —1) 
| Some a > assert (x >a)? ; then assert(count > 0)’ ; 
m.cons ;sendmx; sendax; 
m <— ascending (Some x) —n;; a — match (count—1) — b; 


else abort? //invalid input 


Fig. 2. Monitors using internal state 


failure. Note that the output list m is the same as the input list n because every 
element that we examine is then passed along unchanged to m. 

We can use the ascending monitor to verify that the output list of a sorting 
procedure is in sorted order. To take the example one step further, we can verify 
that the elements in the output list are in fact a permutation of the elements 
in the input list of the sorting procedure as follows. Using a reasonable hash 
function, we hash each element as it is sent to the sorting procedure. Our monitor 
then keeps track of a running total of the sum of the hashes, and as elements 
are received from the sorting procedure, it computes their hash and subtracts it 
from the total. After all of the elements are received, we check that the total is 
0 — if it is, with high probability, the two lists are permutations of each other. 
This example is an instance of result checking, inspired by Wasserman and Blum 
26]. The monitor encoding is straightforward and omitted from the paper. 

Our next example match validates whether a set of right and left parentheses 
match. The monitor can use its internal state to push every left parenthesis it 
sees on its stack and to pop it off when it sees a right parenthesis. For brevity, 
we model our list of parentheses by marking every left parenthesis with a 1 and 
right parenthesis with a -1. So the sequence ()()) would look like 1, —1, 1, —1, —1. 
As we can see, this is not a proper sequence of parenthesis because adding all of 
the integer representations does not yield 0. In a similar vein, we can implement 
a process that checks that a tree is serialized correctly, which is related to recent 
work on context-free session types by Thiemann and Vasconcelos [21]. 


Mapper. Finally, we can also define monitors that check higher-order contracts, 
such as a contract for a mapping function (Fig.3). Consider the mapper which 
takes an integer and doubles it, and a function map that applies this mapper to 
a list of integers to produce a new list of integers. We can see that any integer 
that the mapper has produced will be strictly larger than the original integer, 
assuming the original integer is positive. In order to monitor this contract, it 
makes sense to impose a contract on the mapper itself. This mapper_mon process 
enforces both the precondition, that the original integer is positive, and the 
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mapper tp: {&{done: 1; next : Vn: int.dn: int.mapper_tp}} 
m < mapper = 
case m of 
| done > close m 
| next > x + recv m ; send m (2 * x) ; m — mapper 
map : {list +— mapper-tp ; list} 
k + map + ml = 
case l of 
| nil => m.done ; k.nil ; wait 1; close k 
| cons = m +— mapper-mon +— m; //run monitor 
x — recv l ; send n’ x; y | recv m ; k.cons ; send k y ; k — map m’ dss 
mapper-mon : {mapper_tp — mapper-_tp} 
n +— mapper_mon + m = 
case n of 
| done = m.done ; wait m ; close n 
| next = x + recv n ; assert(x >0)" //checks precondition 
m.next ; send m x ; y — recv m ; assert(y > x)? //checks postcondition 
send n y ; n +— mapper-mon +— m 


Fig. 3. Higher-order monitor 


postcondition, that the resulting integer is greater than the original. We can 
now run the monitor on the mapper, in the map process, before applying the 
mapper to the list l. 


4 Monitors as Partial Identity Processes 


In the literature on contracts, they are often depicted as guards on values sent to 
and returned from functions. In our case, they really are processes that monitor 
message-passing communications between processes. For us, a central property of 
contracts is that a program may be executed with or without contract checking 
and, unless an alarm is raised, the observable outcome should be the same. 
This means that contract monitors should be partial identity processes passing 
messages back and forth along channels while testing properties of the messages. 

This may seem very limiting at first, but session-typed processes can maintain 
local state. For example, consider the functional notion of a dependent contract, 
where the contract on the result of a function depends on its input. Here, a 
function would be implemented by a process to which you send the arguments 
and which sends back the return value along the same channel. Therefore, a 
monitor can remember any (non-linear) “argument values” and use them to 
validate the “result value”. Similarly, when a list is sent element by element, 
properties that can be easily checked include constraints on its length, or whether 
it is in ascending order. Moreover, local state can include additional (private) 
concurrent processes. 

This raises a second question: how can we guarantee that a monitor really is a 
partial identity? The criterion should be general enough to allow us to naturally 
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express the contracts from a wide range of examples. A key constraint is that 
contracts are expressed as session-typed processes, just like functional contracts 
should be expressed within the functional language, or object contracts within 
the object oriented language, etc. 

The purpose of this section is to present and prove the correctness of a 
criterion on session-typed processes that guarantees that they are observationally 
equivalent to partial identity processes. All the contracts in this paper can be 
verified to be partial identities under our definition. 


4.1 Buffering Values 


As a first simple example let’s take a process that receives one positive integer 
n and factors it into two integers p and q that are sent back where p < q. The 
part of the specification that is not enforced is that if n is not prime, p and q 
should be proper factors, but we at least enforce that all numbers are positive 
and n = p * q. We are being very particular here, for the purpose of exposition, 
marking the place where the direction of communication changes with a shift 
(T). Since a minimal number of shifts can be inferred during elaboration of the 
syntax [18], we suppress it in most examples. 


factor_t = Vniint. T dp:int. dq:int. 1 

factor_monitor : {factor_t — factor_t} 

c — factor_monitor — d = 
n — recv c ; assert (n > 0)?! ; shift — recv c ; send dn; send d shift ; 
p + recv d ; assert(p > 0)?2 ; q — recv d ; assert(q > 0)” ; assert(p < q)”* ; 
assert(n = p * q)” ; send c p ; send cq; c+ d 


This is a one-time interaction (the session type factor_t is not recursive), so the 
monitor terminates. It terminates here by forwarding, but we could equally well 
have replaced it by its identity-expanded version at type 1, which is wait d ; 
close c. 

The contract could be invoked by the provider or by the client. Let’s consider 
how a provider factor might invoke it: 


factor : {factor_t} 
c — factor = 
œ — factor_raw ; c + factor_monitor — c ; c = c 


To check that factor_monitor is a partial identity we need to track that p and q are 
received from the provider, in this order. In general, for any received message, we 
need to enter it into a message queue q and we need to check that the messages 
are passed on in the correct order. As a first cut (to be generalized several times), 
we write for negative types: 


[q(b: B7); YF P:: (a: A) 
which expresses that the two endpoints of the monitor are a: A~ and b : B7 


(both negative), and we have already received the messages in q along a. The 
context W declares types for local variables. 
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A monitor, at the top level, is defined with 
MOn: Tı >... —> T—2>{A< A} 


a— mon zi... £n — b = P 


where context W declares value variables x. The body P here is type-checked as 
one of (depending on the polarity of A) 


[](b0: A7); YH P:: (a: A) or (b: A7); YFP: [](a: At) 
where W = (£1:T1) +++ (£n:Tn). A use such as 
C— MON € ...€n — C 


is transformed into 
C — mone,...€n c;c d 


for a fresh c’ and type-checked accordingly. 
In general, queues have the form q = Mı +- Mn with 


mi=l, labels 6,& 
| c channels @,—0 |n value variables 3, Y 
| end close 1 | shift shifts ihe 


where mı is the front of the queue and m, the back. 

When a process P receives a message, we add it to the end of the queue 
q. We also need to add it to W context, marked as unrestricted (non-linear) to 
remember its type. In our example 7 = int. 


[Iq-n](b: B) ;W,n:r- P: (a: A7) 
[q(b: B); Y F n — recv a ; P :: (a: Vnit. Aq) 


VR 


Conversely, when we send along b the message must be equal to the one at 
the front of the queue (and therefore it must be a variable). The m is a value 
variable and remains in the context so it can be reused for later assertion checks. 
However, it could never be sent again since it has been removed from the queue. 


lq](b : [m/n]B) ;W,m:7 + P :: (a: A) 
[m-ql(b: Vnitr. B) ; Y, m:T F send bm; Q :: (a: A) 


VE 


All the other send and receive rules for negative types (Y, —°, &) follow 
exactly the same pattern. For positive types, a queue must be associated with 
the channel along which the monitor provides (the succedent of the sequent 
judgment). 

(b: Bt); WE Q:: [q] (a: At) 


Moreover, when end has been received along b the corresponding process has 
terminated and the channel is closed, so we generalize the judgment to 


w3;WEQ:: [gl(a: AT) with w =- | (b: B). 
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The shift messages change the direction of communication. They therefore 
need to switch between the two judgments and also ensure that the queue has 
been emptied before we switch direction. Here are the two rules for 7, which 
appears in our simple example: 


fq - shift](b : B7); YF P:: (a: At) 
fq] (b : B7) ; Y F shift — recv a ; P :: (a: TAT) 


ÎR 


We notice that after receiving a shift, the channel a already changes polarity (we 
now have to send along it), so we generalize the judgment, allowing the succedent 
to be either positive or negative. And conversely for the other judgment. 
lq](b : B7); VYF P:: (a: A) 
w; WFQ: [q] (a: At) where w=. | (b: B) 
When we send the final shift, we initialize a new empty queue. Because the 
queue is empty the two sides of the monitor must have the same type. 
(b: Bt); WEQ::[](a: Bt) 
[shift](b: B+) ; Y H send b shift ; Q :: (a: B*) 


tL 


The rules for forwarding are also straightforward. Both sides need to have 
the same type, and the queue must be empty. As a consequence, the immediate 
forward is always a valid monitor at a given type. 


(b: At); Wh a—b::[](a: At) id” [](}:A-);WFa<—b::(a:A7) ie 


4.2 Rule Summary 


The current rules allow us to communicate only along the channels a and b 
that are being monitored. If we send channels along channels, however, these 
channels must be recorded in the typing judgment, but we are not allowed to 
communicate along them directly. On the other hand, if we spawn internal (local) 
channels, say, as auxiliary data structures, we should be able to interact with 
them since such interactions are not externally observable. Our judgment thus 
requires two additional contexts: A for channels internal to the monitor, and T 
for externally visible channels that may be sent along the monitored channels. 
Our full judgments therefore are 


lq] (b: B7); %4; r; AFP: (a: A) 
w;W;T;AtQ::[q(a: At) where w=- | (b: B) 


So far, it is given by the following rules 
(VEEL) (b: B); V; T; AF Qe:: [q Hla: AT) 
(b: S{L: BeyeeL); Y; T; AF case b (L> Qe)eer :: lq](a : A?) 
w;Y; r; AFP: [q] (a: Be) (keL) 
w;Y; I; AFa.k; P: |k- q] (a: @{€: Beye) 


DL 
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(VEL) lq- 4b: B); Y; 0; AF Pi: (a: Ag) 
lq] (b : B); Y; T; AF case a (l => Pijeer :: (a: &{0: Ae yeeL) 


[q(o: B); Z; r; AFP: (a:A) (keL) r 

[k -qb : S{L: Beyeer); Y; T; AF b.k; Ps: (a: A) A 
(b: B); Y; T,x:C; AF Q:: [¢-a](a: A) È 

b:C8 B); Y; r; AFzerecvb; Q: [d(a: A) 2 
w3;W;0;AtP:: [q|(a: A) Š 

w; V; T,x:C; AF senda rz; P :: |x- qa: C 8A) 8 
[q-a](b: B); Y; T,x:C; AFP: (a: A) 2 


[q(o:B);¥;0;Atarecva; P::(a:C- A) 
lq] (b: B); Y; r; AF Q: (a: A) 
[x - q|(b : C — B); Y ; T,x:C ; AF send bz; Q:: (a: A) 
W; r; AF Q: [q-end|(a: A) 
(b:1); Y; T; AF waitb; Q:: fq](a : A) 


of 


1L 


1R 


-; W; -;- H close a :: [end] (a : 1) 
(b: B);Wn7;0; AF Q: [¢-n](a: A) 
(b: In:T.B); Y; r; AFnerecvb; Q :: fq](a: A) 
w; U, m:T; l; AF P:: [q] (a: [m/n]A) 

w; W, m:T; I; AF senda m ; P :: [m -q](a : anit. A) 
lq: n|(b: B);Wjn:7;0; AFP: (a: A7) 
lq]: B); Y; r; AFv e recva;P:: (a: Vnit. Aq) 
lq](b : [m/n] B); Y, m:r; r; AF P:: (a: A) 

[m -q|(b : Vnitr. B); Y, m:Tr; T ; AF sendb m; Q: (a: A) 
(b: B7); Y; r; AF Q: fq: shift](a : At) 

(b: |B); V; T; At shift — recv b; Q: [q](a: AT) 
[]0: A); Y; r; AFP: (a: A) 

(b: A7); V; T ; At senda shift ; P :: [shift] (a : |A7) 
[q-shift](b: B7); Y% ; r; At P:: (a: At) 
[q(b: B7); Y; I; At shift — reeva; P:: (a: TA?) 
(b: BY);W;T; AQ: [](a: Bt) 
[shift](b: Bt) ;W; I; At send b shift ; Q :: (a: B?) 


WwW 
tS 


IR 


VR 


VL 


IR 


TR 


TL 
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4.3 Spawning New Processes 


The most complex part of checking that a process is a valid monitor involves 
spawning new processes. In order to be able to spawn and use local (private) 
processes, we have introduced the (so far unused) context A that tracks such 
channels. We use it here only in the following two rules: 


W;AFP:(c:C) w;W;0;A,cChQ:: [d(a: At) 4 

w;W; r; A, AF (ce: C)—P;Q:: [q(a: AT) cee 

W;AFP::(c:C) [g(b:B-);¥3;0; A, cChQ: (a: A) 
lq]: B7); %4; r; A,A FR (ce: C)-—P;Q:: (a: A) 


cut, 


The second premise (that is, the continuation of the monitor) remains the mon- 
itor, while the first premise corresponds to a freshly spawned local progress 
accessible through channel c. All the ordinary left rules for sending or receiving 
along channels in A are also available for the two monitor validity judgments. 
By the strong ownership discipline of intuitionistic session types, none of this 
information can flow out of the monitor. 

It is also possible for a single monitor to decompose into two monitors that 
operate concurrently, in sequence. In that case, the queue q may be split any- 
where, as long as the intermediate type has the right polarity. Note that I’ must 
be chosen to contain all channels in q2, while I” must contain all channels in q1. 


w3;W;D;AtP::[q]l(e: Ct) (c:Ct);¥; I"; A FQ: [qi](a: AT) 
c 
w; Y; T, I'; A,A He:C P ;Q: [q -qla : At) 


+ 
uts 


Why is this correct? The first messages sent along a will be the messages in q1. 
If we receive messages along c in the meantime, they will be first the messages 
in q2 (since P is a monitor), followed by any messages that P may have received 
along b if w = (b : B). The second rule is entirely symmetric, with the flow of 
messages in the opposite direction. 


lq]: B7); #; r; AF P: (e€:C7) [q](c:0C7); X; I"; AF Q: (a: A) 
Ig - q2](b : B7); Y% ; T, I"; A, AF ce:0CT7 P; Q: (a: A) 


cut, 


The next two rules allow a monitor to be attached to a channel x that is 
passed between a and b. The monitored version of x is called x’, where x’ is 
chosen fresh. This apparently violates our property that we pass on all messages 
exactly as received, because here we pass on a monitored version of the original. 
However, if monitors are partial identities, then the original x and the new 2’ 
are indistinguishable (unless a necessary alarm is raised), which will be a tricky 
part of the correctness proof. 


(a: Ct);W;-; ALP: [](a’: Ct) w3W;T,a:Ct; A FAQE [gr qla: At) 
w3W;P,0:0t ; A, A’ Fa’ — P; QE: [q-a qo|(a: At) 
[](@:C7);%;-; AF Pu (a':C7) [q x -q2](b: B7); Y; T, x':C7 ; A’ FQ: (a: A) 
[qa -x+ q2](b: B7); Y; r; A,A ba’ —P;Q:: (a: A) 
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There are two more versions of these rules, depending on whether the types of 
x and the monitored types are positive or negative. These rules play a critical 
role in monitoring higher-order processes, because monitoring c : At — B- 
may require us to monitor the continuation c : B~ (already covered) but also 
communication along the channel z : A* received along c. 

In actual programs, we mostly use cut x — P ; Qin the forma — pé—d;Q 
where p is a defined process. The rules are completely analogous, except that for 
those rules that require splitting a context in the conclusion, the arguments d 
will provide the split for us. When a new sub-monitor is invoked in this way, we 
remember and eventually check that the process p must also be a partial identity 
process, unless we are already checking it. This has the effect that recursively 
defined monitors with proper recursive calls are in fact allowed. This is impor- 
tant, because monitors for recursive types usually have a recursive structure. An 
illustration of this can be seen in pos in Fig. 1. 


4.4 Transparency 


We need to show that monitors are transparent, that is, they are indeed observa- 
tionally equivalent to partial identity processes. Because of the richness of types 
and process expressions and the generality of the monitors allowed, the proof 
has some complexities. First, we define the configuration typing, which consists 
of just three rules. Because we also send and receive ordinary values, we also 
need to type (closed) substitutions o = (v1/n1,...,U¢/nx) using the judgment 
PESEMA 
FVE T oY o: W 


(Ja () (u/n) :: (n= 7) (01,02) 3: (P1, Y2) 


For configurations, we use the judgment 


AFC: A’ 


which expresses that process configuration C uses the channels in A and provides 
the channels in A’. Channels that are neither used nor offered by C are “passed 
through”. Messages are just a restricted form of processes, so they are typed 
exactly the same way. We write pred for either proc or msg. 


oF Cy: Ay Ay FC: Ae 
AFEA Ao F Cy, Cə :: Ao 
v; AF P:(c:A) o:Y 
A’, Alo] F pred(c, P|o]) :: (A’,c: Alo]) pred ::= proc | msg 


To characterize observational equivalence of processes, we need to first charac- 
terize the possible messages and the direction in which they flow: towards the 
client (channel type is positive) or towards the provider (channel type is nega- 
tive). We summarize these in the following table. In each case, c is the channel 
along with the message is transmitted, and c’ is the continuation channel. 
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Message to client of c Message to provider of c 

msg? (c, c.k ; c — c’) (®) msg (d, c.k; d —c) (&) 
msgt(c,sendcd;c«c’) (8) msg (c’,sendcd;c’—c) (—) 
msg* (c, close c) (1) 

msg*(c,sendcvu;c—c’) (3) msg (c’,send cu; d —c) (v) 
msg*(c, send c shift ; c — c’) (|) msg (c’, send c shift ; c’ — c) (f) 


The notion of observational equivalence we need does not observe “nontermi- 
nation”, that is, it only compares messages that are actually received. Since 
messages can flow in two directions, we need to observe messages that arrive at 
either end. We therefore do not require, as is typical for bisimulation, that if one 
configuration takes a step, another configuration can also take a step. Instead we 
say if both configurations send an externally visible message, then the messages 
must be equivalent. 

Supposing TF C: Aand I+ D: A, we write F C~ D:: A for 
our notion of observational equivalence. It is the largest relation satisfying that 
[I'EC~D: A implies 


1. If I” + msgt(c, P) :: I then I’ + (msgt (c, P),C) ~ (msgt(c, P), D) :: A. 
2. If AF msg (c, P) :: A’ then T F (C,msg7(c, P)) ~ (D, msg7 (c, P)) : A’. 
3. IFC = (C', msg" (c, P)) with PC! : Aj and 44 H msg? (c, P) :: A 

and D = (D', msg? (c, Q)) with [+ D :: AS and 45 F msg? (c, Q) :: A 
then A) = AU = A’ and P = Q and LHC ~n D: A’. 

4. If C = (msg` (c, P),C’) with P+ msg` (c, P) : TÍ and T] C’:: A 

and D = (msg (c, Q), D’) with T H msg (c, Q) : T} and T} A D: A 
then T] = T; = I” and P = Q and I"H C ~n D: A. 

5. IfC — C then PEC’. D:: A. 

6. ED — D' then LFC NDP: A. 


Clauses (1) and (2) correspond to absorbing a message into a configuration, 
which may later be received by a process according to clauses (5) and (6). 

Clauses (3) and (4) correspond to observing messages, either by a client 
(clause (3)) or provider (clause (4)). 

In clause (3) we take advantage of the property that a new continuation 
channel in the message P (one that does not appear already in T) is always 
chosen fresh when created, so we can consistently (and silently) rename it in 
C’, A}, and P (and D’, AS, and Q, respectively). This slight of hand allows us 
to match up the context and messages exactly. An analogous remark applies to 
clause (4). A more formal description would match up the contexts and messages 
modulo two renaming substitution which allow us to leave I and A fixed. 

Clauses (5) and (6) make sense because a transition never changes the inter- 
face to a configuration, except when executing a forwarding proc(a, a + b) which 
substitutes b for a in the remaining configuration. We can absorb this renam- 
ing into the renaming substitution. Cut creates a new channel, which remains 
internal since it is linear and will have one provider and one client within the 
new configuration. Unfortunately, our notation is already somewhat unwieldy 
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and carrying additional renaming substitutions further obscures matters. We 
therefore omit them in this presentation. 

We now need to define a relation ~m such that (a) it satisfies the closure 
conditions of ~ and is therefore an observational equivalence, and (b) allows us 
to conclude that monitors satisfying our judgment are partial identities. Unfor- 
tunately, the theorem is rather complex, so we will walk the reader through a 
sequence of generalizations that account for various phenomena. 


The ®, & Fragment. For this fragment, we have no value variables, nor are we 
passing channels. Then the top-level properties we would like to show are 


(G+) I (y: A yesh Px (£: A)[] 

then y : At F proc(x,x — y) ~m P :: (a: At) 
(17) If [](y: A7) 3-3-4 Ps: (£: A7) 

then y: AT F proc(x,x — y) ~m P : (x: A7) 


Of course, asserting that proc(x,x — y) ~m P will be insufficient, because 
this relation is not closed under the conditions of observational equivalence. For 
example, if we add a message along y to both sides, P will change its state once 
it receives the message, and the queue will record that this message still has to 
be sent. To generalize this, we need to define the queue that corresponds to a 
sequence of messages. First, a single message: 


Message to client of c | Message to provider of c 

(msg? (ack; c ey) =e) (mse (ek; =O) =e @&) 
((msg* (c,sendcd;c+c’))) =d (@) (msg (c',send cd; c —c))) =d (—) 
({msg™ (c, close c))) = end (1) 

((msg*(c,sendcvu;c—c’))) =v (3) (msg (c',send c v ; t —c))) =v (v) 
((msg* (c, send c shift ; c — c’))) = shift (|) ((msg~ (c', send c shift ; c’ — c))) = shift (1) 


We extend this to message sequences with (( )) = (-) and KE1, E2) = (E1)) - (E2)), 
provided Apt €; : A; and 4; F Ez : Ag. 

Then we build into the relation that sequences of messages correspond to the 
queue. 


(2+) If (y:Bt) 3 -3-3-b Ps: (a:AtT)[(E)] then y : Bt H E ~m proc(z, P) :: 
(x: At). 

(27) If [(E))(y:B-). ;-;- F P x (@:A7) then y:B7~ F E ~m proc(x, P) :: 
(a:A7). 


When we add shifts the two propositions become mutually dependent, but 
otherwise they remain the same since the definition of {£} is already general 
enough. But we need to generalize the type on the opposite side of queue to be 
either positive or negative, because it switches polarity after a shift has been 
received. Similarly, the channel might terminate when receiving 1, so we also 
need to allow w, which is either empty or of the form y: B. 


(38+) Fw;-;e; Pu a mes then wF E ~m proc(a, P) :: (x:AF). 


z If ne fe -H Po: (aA) then y:B7~ F E ~y proc(x, P) :: 
z:xA). 
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Next, we can permit local state in the monitor (rules cut? and cut] ). The fact 
that neither of the two critical endpoints y and x, nor any (non-local) channel,s 
can appear in the typing of the local process is key. That local process will evolve 
to a local configuration, but its interface will not change and it cannot access 
externally visible channels. So we generalize to allow a configuration D that does 
not use any channels, and any channels it offers are used by P. 


(47) Ifw;-3-; AF Ps: [(KE}] (£: AT) and-+ D: Athenwt E ~y 
D, proc(x, P) :: [q](x : At). 
(47) If [KEX (y: B-)3-5-; AF Ps (@: A) and-- D:: Athen yy: BOF 


E ~m D, proc(a, P) :: (a: A). 


Next, we can allow value variables necessitated by the universal and existential 
quantifiers. Since they are potentially dependent, we need to apply the closing 
substitution g to a number of components in our relation. 


(5+) Ifw;W;-; AFP: [q](£: At) ando: Wand go] = KEY and- + D:: Alo] 
then wļøo] F E ~m D, proc(x, P|o]) :: (x : At[o]). 

(57) If [q(y: B5); Y ;-; AFP: (x: A) ando: W and qo] = E and 
-F D:: Afo] then y: B7[o] F E ~m D, proc(x, P|o]) :: (a: AJo]). 


Breaking up the queue by spawning a sequence of monitors (rule cutf and cutz ) 
just comes down to the compositionally of the partial identity property. This is 
a new and separate way that two configurations might be in the ~m relation, 
rather than a replacement of a previous definition. 


(6) fw F E& ~m Dis: (2 : C) and (2: C) F E& ~m Do :: (x : A) then 
wk (E1, E2) ~m (D1, D2) :: (a: A). 


At this point, the only types that have not yet accounted for are ® and 
—o. If these channels were only “passed through” (without the four cuts rules), 
this would be rather straightforward. However, for higher-order channel-passing 
programs, a monitor must be able to spawn a monitor on a channel that it 
receives before sending on the monitored version. First, we generalize properties 
(5) to allow the context I’ of channels that may occur in the queue q and the 
process P, but that P may not interact with. 


(77) Fw; Y; r; AFP [q(x : At) ando: W and qlo] = KE} and 
-F D:: Afo] then T[o],wlo] + E ~m D, proc(x, P[o]) :: (x : A” [o]). 

(7-) E lquy : B); Y; r; AFP: (x: A) ando: W and go] = E and 
-F D: Alo] then T[o], y : B7[o] F E ~m D, proc(#, P[o]) :: (a: Alo). 


In addition we need to generalize property (6) into (8) and (9) to allow multiple 
monitors to run concurrently in a configuration. 


(8) EFE ~mD: Athen I", DP) F E wm D:: (T',4A). 
(9) If Ii F Ey NM Dı a I> and I> F E NM Də a I5 then Iy H (E1, E2) NM 
(Dı, D2) He I3. 


At this point we can state the main theorem regarding monitors. 
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Theorem 1. If [+ € ~m D:: A according to properties (7*), (77), (8), and(9) 
then FEND: A. 


Proof. By closure under conditions 1-6 in the definition of ~. 


By applying it as in equations (1+) and (17), generalized to include value 
variables as in (5+) and (57) we obtain: 


Corollary 1. If [](b: A); Wt P:: (a: A) or (b: At); WEP: [](a: At) 
then P is a partial identity process. 


5 Refinements as Contracts 


In this section we show how to check refinement types dynamically using our 
contracts. We encode refinements as type casts, which allows processes to remain 
well-typed with respect to the non-refinement type system (Sect. 2). These casts 
are translated at run time to monitors that validate whether the cast expresses 
an appropriate refinement. If so, the monitors behave as identity processes; oth- 
erwise, they raise an alarm and abort. For refinement contracts, we can prove a 
safety theorem, analogous to the classic “Well-typed Programs Can’t be Blamed” 
[25], stating that if a monitor enforces a contract that casts from type A to type 
B, where A is a subtype of B, then this monitor will never raise an alarm. 


5.1 Syntax and Typing Rules 


We first augment messages and processes to include casts as follows. We write 
(A < B)? to denote a cast from type B to type A, where p is a unique label for 
the cast. The cast for values is written as ((r < 7’)?). Here, the types Tr’ and T 
are refinement types of the form {n:t | b}, where b is a boolean expression that 
expresses simple properties of the value n. 


P := -|x — (T 4&7) v; Q|aA L (A <1 B)’ b 


Adding casts to forwarding is expressive enough to encode a more general cast 
(A = B}’P. For instance, the process 7:4 — (A <= B)?P ; Qx can be encoded 
as: y:B — P; x:A — (A & BY y; Qr- 

One of the additional rules to type casts is shown below (both rules can be 
found in Fig. 6). We only allow casts between two types that are compatible 
with each other (written A ~ B), which is co-inductively defined based on the 
structure of the types (the full definition is omitted from the paper). 


A~n B 
W;b:BEa-—(A<=B)? b:: (a: A) 


id_cast 
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5.2 Translation to Monitors 


At run time, casts are translated into monitoring processes. A cast a — (A = 
B)’ b is implemented as a monitor. This monitor ensures that the process that 
offers a service on channel b behaves according to the prescribed type A. Because 
of the typing rules, we are assured that channel b must adhere to the type B. 

Figure4 is a summary of all the translation rules, except recursive types. 
The translation is of the form: [(A < B)?]a = P, where A, B are types; the 
channels a and b are the offering channel and monitoring channel (respectively) 
for the resulting monitoring process P; and p is a label of the monitor (i.e., the 
contract). 

Note that this differs from blame labels for high-order functions, where the 
monitor carries two labels, one for the argument, and one for the body of the 
function. Here, the communication between processes is bi-directional. Though 
the blame is always triggered by processes sending messages to the monitor, 
our contracts may depend on a set of the values received so far, so it does not 
make sense to blame one party. Further, in the case of forwarding, the processes 
at either end of the channel are behaving according to the types (contracts) 
assigned to them, but the cast may forcefully connect two processes that have 
incompatible types. In this case, it is unfair to blame either one of the processes. 
Instead, we raise an alarm of the label of the failed contract. 

The translation is defined inductively over the structure of the types. The 
tensor rule generates a process that first receives a channel (x) from the channel 
being monitored (b). It then spawns a new monitor (denoted by the @monitor 
keyword) to monitor channel x, making sure that it behaves as type Ai, and 
passes the new monitor’s offering channel y to channel a. Finally, the monitor 
continues to monitor b to make sure that it behaves as type Ag. The lolli rule is 
similar to the tensor rule, except that the monitor first receives a channel from 
its offering channel. Similar to the higher-order function case, the argument 
position is contravariant, so the newly spawned monitor checks that the received 
channel behaves as type Bı. The exists rule generates a process that first receives 
a value from the channel b, then checks the boolean condition e to validate the 
contract. The forall rule is similar, except the argument position is contravariant, 
so the boolean expression e’ is checked on the offering channel a. The with rule 
generates a process that checks that all of the external choices promised by the 
type &{£ : Aebecr are offered by the process being monitored. If a label in the 
set I is not implemented, then the monitor aborts with the label p. The plus 
rule requires that, for internal choices, the monitor checks that the monitored 
process only offers choices within the labels in the set @{@: Ae beer. 

For ease of explanation, we omit details for translating casts involving recur- 
sive types. Briefly, these casts are translated into recursive processes. For each 
pair of compatible recursive types A and B, we generate a unique monitor name 
f and record its type f : {A — B} in a context W. The translation algorithm 
needs to take additional arguments, including ¥ to generate and invoke the 
appropriate recursive process when needed. For instance, when generating the 
monitor process for f : {list — list}, we follow the rule for translating internal 
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one 
[(1 = 1)°]a p = wait b; close a 

o ® 
[(A1 — A2 = By — Bo)? ļ]a b = [(A1 8 A2 = Bi ® Bo)? Jan = 
x + recv a; x <— recv b; 
@monitor y — [(Bi = A1)?]y,2 — £ Q@monitor y — [(A1 = Bi)? ]y,2 — £ 
send b y; send a y; 

(A2 = B2)” ]a,b [(A2 = B2)°]a,b 


(V{n:7 |e}. A=V{n:7' |e}. B)’Jay = x — recy a; 
assert p e'(x) (send b x; [(A = B)?]a,v) 


Ww 


A{n:7|e}.A=Af{n:7' |e}. BY ay = £ — recv b; 
assert p e(x) (send a x; (A < B)? Jap) 


~ 
u 


VELEINS, al; (Ae = Be)’Jan = Qe YLLEJALE TI, Qe = abort p 
(D{2 : Achkecr = D{L : Be}eesy la, = case b (L > Qe)eer 


VL LEINJ, bL;|(Ac = Br)’ Jas = Qe VELETALE J, Qe = abort p 


[af ; Acheer < &{£ $ Bosces)? la, = case a (£ = Qe)ecr 


& 


t l 
[TA = 1B ]a b = [QA = IBY Jan = 
shift — recv b; shift — recv a; 
send a shift ; [(A = B)? Jap send b shift ; (A = B)?Ja,o 


Fig. 4. Cast translation 


choices. For [(list <= list)*],. we apply the cons case in the translation to get 
@monitor y — f =x. 


5.3 Metatheory 


We prove two formal properties of cast-based monitors: safety and transparency. 

Because of the expressiveness of our contracts, a general safety (or blame) 
theorem is difficult to achieve. However, for cast-based contracts, we can prove 
that a cast which enforces a subtyping relation, and the corresponding monitor, 
will not raise an alarm. We first define our subtyping relation in Fig. 5. In addi- 
tion to the subtyping between refinement types, we also include label subtyping 
for our session types. A process that offers more external choices can always be 
used as a process that offers fewer external choices. Similarly, a process that 
offers fewer internal choices can always be used as a process that offers more 
internal choices (e.g., non-empty list can be used as a list). The subtyping rules 
for internal and external choices are drawn from work by Acay and Pfenning [1]. 
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ALSA B<B A'< A BK<B 
= j = g o 


<1 A89B<A 8B A — B< A -» B’ 


= 


Ap < A forked JCI Ay < Ai forked ICJ 


O{laby : Akjkes < P{laby : Ay freer & {lab : Ak}res < &{labr : Ay fuer 


A<B A<B A<B TIT A<B T2 L T1 


| T 2| 
{A <|B TA <TB dn:7%1.A<dn:7.B Vn: T1.A < Yn : T2.B 


def(A) < def (B) Vu:T, [u/x]bı —* true implies [v/a]b2 =* true 


def refine 
A<B {æ:T | bi} < {a:7 | bo} 


Fig. 5. Subtyping 


For recursive types, we directly examine their definitions. Because of these recur- 
sive types, our subtyping rules are co-inductively defined. 

We prove a safety theorem (i.e., well-typed casts do not raise alarms) via 
the standard preservation theorem. The key is to show that the monitor process 
generated from the translation algorithm in Fig. 4 is well-typed under a typing 
relation which guarantees that no abort state can be reached. We refer to the type 
system presented thus far in the paper as T, where monitors that may evaluate 
to abort can be typed. We define a stronger type system S' which consists of the 
rules in T with the exception of the abort rule and we replace the assert rule 
with the assert_strong rule. The new rule for assert, which semantically verifies 
that the condition b is true using the fact that the refinements are stored in the 
context W, is shown below. The two type systems are summarized in Fig. 6. 


Theorem 2 (Monitors are well-typed). Let Y be the context containing the 
type bindings of all recursive processes. 


10;b:Bkr[As re alle : (a: A). 
2. If B< A, thnW;b:Bkg [A= By lee =: (a: A). 


Proof. The proof is by induction over the monitor translation rules. For 2, we 
need to use the sub-typing relation to show that (1) for the internal and external 
choice cases, no branches that include abort are generated; and (2) for the forall 
and exists cases, the assert never fails (i.e., the assert_strong rule applies). 


As a corollary, we can show that when executing in a well-typed context, a 
monitor process translated from a well-typed cast will never raise an alarm. 


Corollary 2 (Well-typed casts cannot raise alarms). F C :: b: B and 
B < A implies C, proc(a, | (A = B}’]a p) —* abort(p). 


Finally, we prove that monitors translated from casts are partial identify 
processes. 
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Both System T and S 


W;AtLP::(@:A) 2: A,A'EQ:: (c:C) 


id cut 
W;b:Araeb:: (a: A) W;A, A’ ae:AH—P;Q::(e:C) 
v; AFP: (e: At) Y; A,c: At FQ: (d: D) 
TR TE 
Ww; AF shift — recv c ; P:: (c: TAT) Y ; A, c: tA* F send c shift ; Q :: (d : D) 
TAk Pe(erA~) W;A,c: A~FQ::(d: D) 
LR IL 
Ww; AF send c shift; P :: (e: |A_) Y ; A, c: JA F shift — recv c ; Q :: (d: D) 
W;AtQ::(d: D) 
-c 1R 1L 
-H close c :: (c : 1) PA, c:1Fwite: Q: (d: D) 
W;AtP::(c: B) W;A,x2:A,co:BEQ::(d:D) 
@R 
W;A,a:Atsendc a;P::(c: AQB) W;A,c:A@BEauereve;Q:: (d: D) 
PAT AHP (eiB) TrA ci BFQ = (d: D) 
R 
W; AF gerecwvc; P::(c: A— B) W;A,a:A,c:A—- Bisendca;Q:: (d: D) 
W; AF Pe:: (c: Ag) for every LE L keL W;A,c: A,r Q: (d: D) 
&R 
W; A F case c (£ => PeJeer 3: (c: &{L: Acheer) Y; A, c: &{L: Achkeen F e-k; Q: (d: D) 
kEL W;AtKP::(c: Ag) W;A,c: Ag’ Qe::(d: D) for every 2EL 
DR BL 
Y; AFce.k; P: (c: O{£: Acheer) W; A, c: @O{£: Achecr F case c (L > Qe)jeer :: (d: D) 
Wruv:t P; AHP (e: [v/n]A) T wT Aci AF Q:: (d: D) 
AR 
Ww; AFsendcev; P:: (c: dnit. A) Y; Ac: IJn. AF ne recwvc; Q: (d: D) 
PTAR P:: (ce: A) Wrv:7t W;A,ec:[v/nJAFQ:: (d: D) 
VR VL 
w; Arn revc; P:: (ce: Vnitr. A) W;A,c:Vnitz.A bt send cv ; Q :: (d: D) 
Weou:t! Pgæ:r; AQ: (e:C) Tur AnB 
val_cast id_cast 
P; AFre(r&r v; Q: (e: 0) ~W;b:Brkae~(A<B)?P b:: (a: A) 


System T only 


WF b: bool Y; AFQ: (a: A) 


assert abort 
w; At assert p b; Q :: (x : A) Ww; AF abort p:: (x : A) 


System S only 


Vik btrue #; AF Q: (a: A) 


assert_strong 
Ww; At assert p b; Q: (a: A) 


Fig. 6. Typing process expressions 


Theorem 3 (Casts are transparent). 
b: BE proc(b, a — b) ~ proc(a, (A = B)? lap) £ (a: A). 


Proof. We just need to show that the translated process passes the partial iden- 
tity checks. We can show this by induction over the translation rules and by 
applying the rules in Sect. 4. We note that rules in Sect. 4 only consider identical 
types; however, our casts only cast between two compatible types. Therefore, we 
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can lift A and B to their super types (i.e., insert abort cases for mismatched 
labels), and then apply the checking rules. This does not change the semantics 
of the monitors. 


6 Related Work 


There is a rich body of work on higher-order contracts and the correctness of 
blame assignments in the context of the lambda calculus [2,7,8, 10,16, 24,25]. The 
contracts in these papers are mostly based on refinement or dependent types. Our 
contracts are more expressive than the above, and can encode refinement-based 
contracts. While our monitors are similar to reference monitors (such as those 
described by Schneider [19]), they have a few features that are not inherent to 
reference monitors such as the fact that our monitors are written in the target 
language. Our monitors are also able to monitor contracts in a higher-order 
setting by spawning a separate monitor for the sent /received channel. 

Disney et al.’s [9] work, which investigates behavioral contracts that enforce 
temporal properties for modules, is closely related to our work. Our contracts 
(i.e., session types) also enforce temporal properties; the session types specify the 
order in which messages are sent and received by the processes. Our contracts 
can also make use of internal state, as those of Disney et al, but our system is 
concurrent, while their system does not consider concurrency. 

Recently, gradual typing for two-party session-type systems has been devel- 
oped [14,20]. Even though this formalism is different from our contracts, the way 
untyped processes are gradually typed at run time resembles how we monitor 
type casts. Because of dynamic session types, their system has to keep track of 
the linear use of channels, which is not needed for our monitors. 

Most recently, Melgratti and Padovani have developed chaperone contracts 
for higher-order session types [17]. Their work is based on a classic interpre- 
tation of session types, instead of an intuitionistic one like ours, which means 
that they do not handle spawning or forwarding processes. While their contracts 
also inspect messages passed between processes, unlike ours, they cannot model 
contracts which rely on the monitor making use of internal state (e.g., the paren- 
thesis matching). They proved a blame theorem relying on the notion of locally 
correct modules, which is a semantic categorization of whether a module satisfies 
the contract. We did not prove a general blame theorem; instead, we prove a 
somewhat standard safety theorem for cast-based contracts. 

The Whip system [27] addresses a similar problem as our prior work [15], 
but does not use session types. They use a dependent type system to imple- 
ment a contract monitoring system that can connect services written in different 
languages. Their system is also higher order, and allows processes that are moni- 
tored by Whip to interact with unmonitored processes. While Whip can express 
dependent contacts, Whip cannot handle stateful contracts. Another distinguish- 
ing feature of our monitors is that they are partial identity processes encoded in 
the same language as the processes to be monitored. 
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7 Conclusion 


We have presented a novel approach for contract-checking for concurrent pro- 
cesses. Our model uses partial identity monitors which are written in the same 
language as the original processes and execute transparently. We define what 
it means to be a partial identity monitor and prove our characterization cor- 
rect. We provide multiple examples of contracts we can monitor including ones 
that make use of the monitor’s internal state, ones that make use of the idea 
of probabilistic result checking, and ones that cannot be expressed as depen- 
dent or refinement types. We translate contracts in the refinement fragment into 
monitors, and prove a safety theorem for that fragment. 


Acknowledgment. This research was supported in part by NSF grant CNS1423168 
and a Carnegie Mellon University Presidential Fellowship. 
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Abstract. A key requirement for many distributed systems is to be 
resilient toward partial failures, allowing a system to progress despite 
the failure of some components. This makes programming of such sys- 
tems daunting, particularly in regards to avoiding inconsistencies due 
to failures and asynchrony. This work introduces a formal model for 
crash failure handling in asynchronous distributed systems featuring a 
lightweight coordinator, modeled in the image of widely used systems 
such as ZooKeeper and Chubby. We develop a typing discipline based 
on multiparty session types for this model that supports the specifica- 
tion and static verification of multiparty protocols with explicit failure 
handling. We show that our type system ensures subject reduction and 
progress in the presence of failures. In other words, in a well-typed system 
even if some participants crash during execution, the system is guaran- 
teed to progress in a consistent manner with the remaining participants. 


1 Introduction 


Distributed Programs, Partial Failures, and Coordination. Developing programs 
that execute across a set of physically remote, networked processes is challeng- 
ing. The correct operation of a distributed program requires correctly designed 
protocols by which concurrent processes interact asynchronously, and correctly 
implemented processes according to their roles in the protocols. This becomes 
particularly challenging when distributed programs have to be resilient to partial 
failures, where some processes crashes while others remain operational. Partial 
failures affect both safety and liveness of applications. Asynchrony is the key 
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issue, resulting in the inability to distinguish slow processes from failed ones. In 
general, this makes it impossible for processes to reach agreement, even when 
only a single process can crash [19]. 

In practice, such impasses are overcome by making appropriate assumptions 
for the considered infrastructure and applications. One common approach is to 
assume the presence of a highly available coordination service [26] — realized using 
a set of replicated processes large enough to survive common rates of process 
failures (e.g., 1 out of 3, 2 out of 5) — and delegating critical decisions to this 
service. While this coordinator model has been in widespread use for many years 
(cf. consensus service [22]), the advent of cloud computing has recently brought it 
further into the mainstream, via instances like Chubby [4] and ZooKeeper [26]. 
Such systems are used not only by end applications but also by a variety of 
frameworks and middleware systems across the layers of the protocol stack [11, 
20,31, 40]. 


Typing Disciplines for Distributed Programs. Typing disciplines for distributed 
programs is a promising and active research area towards addressing the chal- 
lenges in the correct development of distributed programs. See Hiittel et al. 
[27] for a broad survey. Session types are one of the established typing disci- 
plines for message passing systems. Originally developed in the z-calculus [23], 
these have been later successfully applied to a range of practical languages, e.g., 
Java [25,41], Scala [39], Haskell [34,38], and OCaml [28,37]. Multiparty session 
types (MPSTs) [15,24] generalize session types beyond two participants. In a 
nutshell, a standard MPST framework takes (1) a specification of the whole 
multiparty message protocol as a global type; from which (2) local types, describ- 
ing the protocol from the perspective of each participant, are derived; these are 
in turn used to (3) statically type check the I/O actions of endpoint programs 
implementing the session participants. A well-typed system of session endpoint 
programs enjoys important safety and liveness properties, such as no reception 
errors (only expected messages are received) and session progress. A basic intu- 
ition behind MPSTs is that the design (i.e., restrictions) of the type language 
constitutes a class of distributed protocols for which these properties can be 
statically guaranteed by the type system. 

Unfortunately, no MPST work supports protocols for asynchronous dis- 
tributed programs dealing with partial failures due to process crashes, so the 
aforementioned properties no longer hold in such an event. Several MPST works 
have treated communication patterns based on exception messages (or inter- 
rupts) [6,7,16]. In these works, such messages may convey exceptional states 
in an application sense; from a protocol compliance perspective, however, these 
messages are the same as any other message communicated during a normal 
execution of the session. This is in contrast to process failures, which may inval- 
idate already in-transit (orphan) messages, and where the task of agreeing on 
the concerted handling of a crash failure is itself prone to such failures. 

Outside of session types and other type-based approaches, there have been a 
number of advances on verifying fault tolerant distributed protocols and appli- 
cations (e.g., based on model checking [29], proof assistants [44]); however, little 
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work exists on providing direct compile-time support for programming such 
applications in the spirit of MPSTs. 


Contributions and Challenges. This paper puts forward a new typing discipline 
for safe specification and implementation of distributed programs prone to pro- 
cess crash failures based on MPSTs. The following summarizes the key challenges 
and contributions. 


Multiparty session calculus with coordination service. We develop an 
extended multiparty session calculus as a formal model of processes prone 
to crash failures in asynchronous message passing systems. Unlike standard 
session calculi that reflect only “minimal” networking infrastructures, our 
model introduces a practically-motivated coordinator artifact and explicit, 
asynchronous messages for run-time crash notifications and failure handling. 

MPSTs with explicit failure handling. We introduce new global and local 
type constructs for explicit failure handling, designed for specifying protocols 
tolerating partial failures. Our type system carefully reworks many of the 
key elements in standard MPSTs to manage the intricacies of handling crash 
failures. These include the well-formedness of failure-prone global types, and 
the crucial coherence invariant on MPST typing environments to reflect the 
notion of system consistency in the presence of crash failures and the resulting 
errors. We show safety and progress for a well-typed MPST session despite 
potential failures. 


To fit our model to practice, we introduce programming constructs similar 
to well-known and intuitive exception handling mechanisms, for handling con- 
current and asynchronous process crash failures in sessions. These constructs 
serve to integrate user-level session control flow in endpoint processes and the 
underlying communications with the coordination service, used by the target 
applications of our work to outsource critical failure management decisions (see 
Fig. 1). It is important to note that the coordinator does not magically solve 
all problems. Key design challenges are to ensure that communication with it 
is fully asynchronous as in real-life, and that it is involved only in a “minimal” 
fashion. Thus we treat the coordinator as a first-class, asynchronous network 
artifact, as opposed to a convenient but impractical global “oracle” (cf. [6}), 
and our operational semantics of multiparty sessions remains primarily chore- 
ographic in the original spirit of distributed MPSTs, unlike works that resort 
to a centralized orchestrator to conduct all actions [5,8]. As depicted in Fig. 1, 
application-specific communication does not involve the coordinator. Our model 
lends itself to common practical scenarios where processes monitor each other 
in a peer-based fashion to detect failures, and rely on a coordinator only to 
establish agreement on which processes have failed, and when. 

A long version of this paper is available online [43]. The long version contains: 
full formal definitions, full proofs, and a prototype implementation in Scala. 


Example. As a motivating example, Fig. 2 gives a global formal specification for 
a big data streaming task between a distributed file system (DFS) dfs, and two 
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System |O Process 
Application Coordinator be Robust process [dfs]G = t(ut. 
VI p | Crashed process 
ao. F © Coordinator dfs—w1 la, (5).dfs—ws laz (5). 
(O © Coordinator w1 dfs lr, (S').w2—dfs lra (S").t 
Z Pe a he replica )h( 
a- —C icati 
k i Fa S AAA ue {wi}: pt’ dfs we d (5). i 
OX N >% communication w2— dfs lp (S°)-t', 
À — » Notification {w2}:..., {w1, we }:end) 
Fig. 1. Coordinator model for asyn- Fig. 2. Global type for a big data 
chronous distributed systems. The coor- streaming task with failure han- 
dinator is implemented by replicated dling capabilities. 


processes (internals omitted). 


workers w;,9. The DFS streams data to two workers, which process the data and 
write the result back. Most DFSs have built-in fault tolerance mechanisms [20], 
so we consider dfs to be robust, denoted by the annotation [dfs]; the workers, 
however, may individually fail. In the try-handle construct t(...)h(...), the try- 
block t(...) gives the normal (i.e., failure-free) flow of the protocol, and h(...) 
contains the explicit handlers for potential crashes. In the try-block, the workers 
receive data from the DFS (dfs—w,;), perform local computations, and send back 
the result (w;— dfs). If a worker crashes ({w;}: ...), the other worker will also take 
over the computation of the crashed worker, allowing the system to still produce 
a valid result. If both workers crash (by any interleaving of their concurrent 
crash events), the global type specifies that the DFS should safely terminate its 
role in the session. 

We shall refer to this basic example, that focuses on the new failure handling 
constructs, in explanations in later sections. We also give many further examples 
throughout the following sections to illustrate the potential session errors due to 
failures exposed by our model, and how our framework resolves them to recover 
MPST safety and progress. 


Roadmap. Section 2 describes the adopted system and failure model. Section 3 
introduces global types for guiding failure handling. Section 4 introduces our pro- 
cess calculus with failure handling capabilities and a coordinator. Section 5 intro- 
duces local types, derived from global types by projection. Section 6 describes 
typing rules, and defines coherence of session environments with respect to end- 
point crashes. Section 7 states properties of our model. Section 8 discusses related 
work. Sect.9 draws conclusions. 


2 System and Failure Model 


In distributed systems care is required to avoid partial failures affecting liveness 
(e.g., waiting on messages from crashed processes) or safety (e.g., when processes 
manage to communicate with some peers but not others before crashing) prop- 
erties of applications. Based on the nature of the infrastructure and application, 
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appropriate system and failure models are chosen along with judiciously made 
assumptions to overcome such impasses in practice. 

We pinpoint the key characteristics of our model, according to our practical 
motivations and standard distributed systems literature, that shape the design 
choices we make later for the process calculus and types. As it is common we 
augment our system with a failure detector (FD) to allow for distinguishing slow 
and failed processes. The advantage of the FD (1) in terms of reasoning is that 
it concentrates all assumptions to solve given problems and (2) implementation- 
wise it yields a single main module where time-outs are set and used. 
Concretely we make the following assumptions on failures and the system: 


(1) Crash-stop failures: Application processes fail by crashing (halting), and 
do not recover. 

(2) Asynchronous system: Application processes and the network are asyn- 
chronous, meaning that there are no upper bounds on processes’ relative 
speeds or message transmission delays. 

(3) Reliable communication: Messages transmitted between correct (i.e., 
non-failed) participants are eventually received. 

(4) Robust coordinator: The coordinator (coordination service) is perma- 
nently available. 

(5) Asynchronous reliable failure detection: Application processes have 
access to local FDs which eventually detect all failed peers and do not falsely 
suspect peers. 


(1)-(3) are standard in literature on fault-tolerant distributed systems [19]. 

Note that processes can still recover but will not do so within sessions (or 
will not be re-considered for those). Other failure models, e.g., network parti- 
tions [21] or Byzantine failures [32], are subject of future work. The former are 
not tolerated by ZooKeeper et al., and the latter have often been argued to be 
a too generic failure model (e.g., [3]). 

The assumption on the coordinator (4) implicitly means that the number 
of concomitant failures among the coordinator replicas is assumed to remain 
within a minority, and that failed replicas are replaced in time (to tolerate fur- 
ther failures). Without loss of validity, the coordinator internals can be treated 
as a blackbox. The final assumption (5) on failure detection is backed in practice 
by the concept of program-controlled crash [10], which consists in communicat- 
ing decisions to disregard supposedly failed processes also to those processes, 
prompting them to reset themselves upon false suspicion. In practice systems 
can be configured to minimize the probability of such events, and by a “two- 
level” membership consisting in evicting processes from individual sessions (cf. 
recovery above) more quickly than from a system as a whole; several authors 
have also proposed network support to entirely avoid false suspicions (e.g., [33]). 

These assumptions do not make handling of failures trivial, let alone mask 
them. For instance, the network can arbitrarily delay messages and thus reorder 
them with respect to their real sending times, and (so) different processes can 
detect failures at different points in time and in different orders. 
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(Basic type) S ::= bool | str | int 
(Global type) G ::= p > q{li(Si).Gi}ier | ut-G | t | end | t(Gi)h(H)*.G2 
(Handling env.) H:= F:G | H,H (Handler sig.) F ::= {pi jier 


Fig. 3. Syntax of global types with explicit handling of partial failures. 


3 Global Types for Explicit Handling of Partial Failures 


Based on the foundations of MPSTs, we develop global types to formalize spec- 
ifications of distributed protocols with explicit handling of partial failures due 
to role crashes, simply referred to as failures. We present global types before 
introducing the process calculus to provide a high-level intuition of how failure 
handling works in our model. 

The syntax of global types is depicted in Fig.3. We use the following base 
notations: p,q,... for role (i.e., participant) names; lı, l2,... for message labels; 
and t, t’,... for type variables. Base types S may range over, bool, int, etc. 

Global types are denoted by G. We first summarize the constructs from 
standard MPST [15,24]. A branch type p > q{l;(S;).Gi}ier means that p can 
send to q one of the messages of type Sp with label l, where k is a member of the 
non-empty index set I. The protocol then proceeds according to the continuation 
Gp. When IJ is a singleton, we may simply write p—q I(.S').G. We use t for type 
variables and take an equi-recursive view, i.e., t.G and its unfolding [yt.G/t] 
are equivalent. We assume type variable occurrences are bound and guarded 
(e.g., ut.t is not permitted). end is for termination. 

We now introduce our extensions for partial failure handling. A try-handle 
t(G,)h(H)*.G2 describes a “failure-atomic” protocol unit: all live (i.e., non- 
crashed) roles will eventually reach a consistent protocol state, despite any con- 
current and asynchronous role crashes. The try-block G4 defines the default 
protocol flow, and H is a handling environment. Each element of H maps a han- 
dler signature F, that specifies a set of failed roles {p;}ic7, to a handler body 
specified by a G. The handler body G specifies how the live roles should proceed 
given the failure of roles F. The protocol then proceeds (for live roles) according 
to the continuation G2 after the default block G, or failure handling defined in 
H has been completed as appropriate. 

To simplify later technical developments, we annotate each try-handle term 
in a given G by a unique «x € N that lexically identifies the term within G. These 
annotations may be assigned mechanically. As a short hand, we refer to the try- 
block and handling environment of a particular try-handle by its annotation; 
e.g., we use & to stand for t(G1)h( H)“. In the running examples (e.g., Fig. 2), if 
there exists only one try-handle, we omit « for simplicity. 


Top-Level Global Types and Robust Roles. We use the term top-level global type 
to mean the source protocol specified by a user, following a typical top-down 
interpretation of MPST frameworks [15,24]. We allow top-level global types to 
be optionally annotated [p]G, where [p] specifies a set of robust roles—i.e., roles 
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that can be assumed to never fail. In practice, a participant may be robust if 
it is replicated or is made inherently fault tolerant by other means (e.g., the 
participant that represents the distributed file system in Fig. 2). 


Well-Formedness. The first stage of validation in standard MPSTs is to check 
that the top-level global type satisfies the supporting criteria used to ensure the 
desired properties of the type system. We first list basic syntactic conditions 
which we assume on any given G: (i) each F is non-empty; (ii) a role ina F 
cannot occur in the corresponding handler body (a failed role cannot be involved 
in the handling of its own failure); and (iii) every occurrence of a non-robust 
role p must be contained within a, possibly outer, try-handle that has a handler 
signature {p} (the protocol must be able to handle its potential failure). Lastly, 
to simplify the presentation without loss of generality, we impose that separate 
branch types not defined in the same default block or handler body must have 
disjoint label sets. This can be implicitly achieved by combining label names 
with try-handle annotations. 

Assuming the above, we define well-formedness for our extended global types. 
We write G’ € G to mean that G” syntactically occurs in G (€ is reflexive); sim- 
ilarly for the variations «x € G and «& € K’. Recall « is shorthand for t(G,)h(H)". 
We use a lookup function outerg(x) for the set of all try-handles in G that enclose 
a given «, including « itself, defined by outerg(K) = {k’ | KEK Ak’ EG}. 


Definition 1 (Well-formedness). Let « stand for t(Gi)h(H)*, and x’ for 
t(G4)h(H’)* . A global type G is well-formed if both of the following conditions 
hold. For all x € G: 


1. VF, € dom(H).VF_ € dom(H).AK’ € outerg(k) s.t. Fı U Fy € dom(H') 
2. AF € dom(H).3k' € outerg(k).4F’ € dom( H’) s.t. K AKRAF' CF 


The first condition asserts that for any two separate handler signatures of 
a handling environment of «K, there always exists a handler whose handler sig- 
nature matches the union of their respective failure sets — this handler is either 
inside the handling environment of « itself, or in the handling environment of 
an outer try-handle. This ensures that if roles are active in different handlers of 
the same try-handle then there is a handler whose signature corresponds to the 
union over the signatures of those different handlers. Example 2 together with 
Example 3 in Sect. 4 illustrate a case where this condition is needed. The second 
condition asserts that if the handling environment of a try-handle contains a 
handler for F, then there is no outer try-handle with a handler for F’ such that 
F' C F. The reason for this condition is that in the case of nested try-handles, 
our communication model allows separate try-handles to start failure handling 
independently (the operational semantics will be detailed in the next section; see 
(TryHdl) in Fig. 6). The aim is to have the relevant roles eventually converge on 
performing the handling of the outermost try-handle, possibly by interrupting 
the handling of an inner try-handle. Consider the following example: 
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Example 1. G = t(t(G’)h({p1,p2}: Gi)”)h({pi}: G4)! violates condition 2 
because, when p; and pg both failed, the handler signature {p;} will still be 
triggered (i.e., the outer try-handle will eventually take over). It is not sensible 
to run G4, instead of Gi (which is for the crashes of p; and pg). 


Task: (x, 0) za (K,{P }) (k, {P, P) % 


P, gZ 

ta f a 

y Y4 N, í AAA 2 

4 (1) (2) 4 (3) (4) time 

W issued W received all W issued W received all 
failure dones for ZA, failure dones for RA 
notifications but didn’t notifications and issued 
for P, issue done for P} dones 


Fig. 4. Challenges under pure asynchronous interactions with a coordinator. Between 
time (1) and time (2), the task ¢ = (x, Ø) is interrupted by the crash of Pa. Between 
time (3) and time (4), due to asynchrony and multiple crashes, P. starts handling the 
crash of { Pa, Pa} without handling the crash of {P4}. Finally after (4) P, and P- finish 
their common task. 


4 A Process Calculus for Coordinator-Based Failure 
Handling 


Figure 4 depicts a scenario that can occur in practical asynchronous systems 
with coordinator-based failure handling through frameworks such as ZooKeeper 
(Sect. 2). Using this scenario, we first illustrate challenges, formally define our 
model, and then develop a safe type system. 

The scenario corresponds to a global type of the form t(G)h({P,} : 
Ga, {Pa, Pa} : Gaa,..-)", with processes P, q and a coordinator W. We define 
a task to mean a unit of interactions, which includes failure handling behav- 
iors. Initially all processes are collaborating on a task ¢, which we label (x, Ø) 
(identifying the task context, and the set of failed processes). The shaded boxes 
signify which task each process is working on. Dotted arrows represent notifica- 
tions between processes and W related to task completion, and solid arrows for 
failure notifications from W to processes. During the scenario, P, first fails, then 
P, fails: the execution proceeds through failure handling for {P,} and {P}, Pu}. 


(I) When P, reaches the end of its part in ¢, the application has P, notify 
W. P, then remains in the context of ¢ (the continuation of the box after 
notifying) in consideration of other non-robust participants still working 
on ¢—P, may yet need to handle their potential failure(s). 
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(Expression) e :=u|a|le+e|—el... (Channel) c :=s[p] | y 
(Process) P::=alp|(y).P | c:n (Level) $ :=(k, F) 
(Statement) n ::=t(n)h(H)*.7 | O | 0 | p!l(e).ņn (Declaration) D :=X(x)=n 
| p?{li(xi)mi tier | X (e) (Handling) H :=F: n | H,H 
| def D in 7 | if e 7 else 7 
(Application) N:=P | N|N | s:h (Queue) h :=ø0|h-m 
(Message) m::= (p, q,L(v)) | (p, crash F} | dn (Done) dn::= (p, q)* 
(System) S::= YAN | (vs)S | S|S (Coordinator) Y ::=G : (F,d) 
(Context) E::=t(E)h(H)®%.7 | def D in E | [] (Done Queue) d :=@ | d- dn 


Fig. 5. Grammar for processes, applications, systems, and evaluation contexts. 


(II) The processes of synchronizing on the completion of a task or performing 
failure handling are themselves subject to failures that may arise concur- 
rently. In Fig. 4, all processes reach the end of ¢ (i.e., four dotted arrows 
from ¢), but P, fails. W determines this failure and it initiates failure 
handling at time (1), while done notifications for ¢@ continue to arrive 
asynchronously at time (2). The failure handling for crash of P, is itself 
interrupted by the second failure at time (3). 

(III) ¥ can receive notifications that are no longer relevant. For example, at 
time (2), W has received all done notifications for ¢, but the failure of P, 
has already triggered failure handling from time (1). 

(IV) Due to multiple concurrent failures, interacting participants may end up 
in different tasks: around time (2), P, and Pq are in task ¢/ = (kK, {Pa}), 
whereas P, is still in ¢ (and asynchronously sending or receiving messages 
with the others). Moreover, P, never executes ¢’ because of delayed noti- 
fications, so it goes from ¢ directly to (kK, {Pa, Pa}). 


Processes. Figure5 defines the grammar of processes and (distributed) applica- 
tions. Expressions e, e;,.. can be values v, v;,..., variables x, £i, ..., and standard 
operations. (Application) processes are denoted by P, P;,.... An initialization 
alp|(y).P agrees to play role p via shared name a and takes actions defined in P; 
actions are executed on a session channel c : 7, where c ranges over s[p] (session 
name and role name) and session variables y; 7 represents action statements. 

A try-handle t(7)h(H)? attempts to execute the local action 7, and can handle 
failures occurring therein as defined in the handling environment H, analogously 
to global types. H thus also maps a handler signature F to a handler body 7 
defining how to handle F. Annotation ¢ = (kK, F) is composed of two elements: 
an identity « of a global try-handle, and an indication of the current handler sig- 
nature which can be empty. F = Ø means that the default try-block is executing, 
whereas F 4 Ú means that the handler body for F is executing. Term 0 only 
occurs in a try-handle during runtime. It denotes a yielding for a notification 
from a coordinator (introduced shortly). 

Other statements are similar to those defined in [15,24]. Term 0 represents 
an idle action. For convention, we omit 0 at the end of a statement. Action 
p! I(e).7 represents a sending action that sends p a label / with content e, then 
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it continues as 7. Branching p?{l;(2;).7:}ier represents a receiving action from p 
with several possible branches. When label lẹ is selected, the transmitted value 
v is saved in xp, and 7,{v/x,} continues. For convenience, when there is only 
one branch, the curly brackets are omitted, e.g., c : p?l(a).P means there is only 
one branch I(x). X(e) is for a statement variable with one parameter e, and 
def D in 7 is for recursion, where declaration D defines the recursive body that 
can be called in 7. The conditional statement is standard. 

The structure of processes ensures that failure handling is not interleaved 
between different sessions. However, we note that in standard MPSTs [15,24], 
session interleaving must anyway be prohibited for the basic progress property. 
Since our aim will be to show progress, we disallow session interleaving within 
process bodies. Our model does allow parallel sessions at the top-level, whose 
actions may be concurrently interleaved during execution. 


(Distributed) Systems. A (distributed) system in our programming framework 
is a composition of an application, which contains more than one process, and 
a coordinator (cf. Fig. 1). A system can be running within a private session s, 
represented by (vs)S, or S | S’ for systems running in different sessions indepen- 
dently and in parallel (i.e., no session interleaving). The job of the coordinator 
is to ensure that even in the presence of failures there is consensus on whether 
all participants in a given try-handle completed their local actions, or whether 
failures need to be handled, and which ones. We use VW = G : (F,d) to denote 
a (robust) coordinator for the global type G, which stores in (F,d) the fail- 
ures F that occurred in the application, and in d done notifications sent to the 
coordinator. The coordinator is denoted by w when viewed as a role. 

A (distributed) application! is a process P, a parallel composition N | N’, or 
a global queue carrying messages s : h. A global queue s : h carries a sequence 
of messages m, sent by participants in session s. A message is either a regular 
message (p,q,l(v)) with label | and content v sent from p to q or a notifica- 
tion. A notification may contain the role of a coordinator. There are done and 
failure notifications with two kinds of done notifications dn used for coordina- 
tion: (p, Y) notifies ~ that p has finished its local actions of the try-handle 
ġ; (Y, p)? is sent from Y% to notify p that w has received all done notifications 
for the try-handle ¢ so that p shall end its current try-handle and move to its 
next task. For example, in Fig. 4 at time (4) the coordinator will inform P, and 
P, via (a, Py) {Pe Pa} (ap, Po) {Pa Pa}) that they can finish the try-handle 
(K, {Pa, Pi}). Note that the appearance of (y, p)? implies that the coordina- 
tor has been informed that all participants in ¢ have completed their local 
actions. We define two kinds of failure notifications: (w,crash F) notifies y% 
that F occurred, e.g., {q} means q has failed; (p,crash F} is sent from ~ to 
notify p about the failure F for possible ar a We write (p, crash F}, where 
P = P1,- Pn short for (pz,crash F) -...- (pn, crash F); similarly for (,p)?. 


1 Other works use the term network which is the reason why we use N instead of, 
e.g., A. We call it application to avoid confusion with the physical network which 
interconnects all processes as well as the coordinator. 
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alps](y1)-Ps | | alPal(gn)-Pa = ie) 
(vs)(G : (0,0)P.{s[p1]/y1} |---| Pr{slpn]/yn} | s:0) a:G 
s[p] : Elq! Ue).n] |s:h—s[p]: Efn] | s:h-(p,a,l(v)) eu  (Snd) 


s[p] : Elq?{li(awi)-mi}ser] | s: (a, p, l(vk)) -h > 
stitial acr ker R 


s[p] : E[def X(x) = 7 in X(e)] > s[p] : Eldef X(x) = n in n{v/x}] e4 v (Rec) 


N, = N3 > Na = No Ni > No 
Ni => Na Ni|N —> N2|N (Str, Par) 
Ny => Nə > Ss’ (Sys, New) 


Wen, > YAN? (vs)S => (vs)S’ 
N | s:h>N\sl[p]:7 | s: remove(h, p) - (a, crash {p}) 
s[p] : 7 non-robust (Crash) 


Fig. 6. Operational semantics of distributed applications, for local actions. 


Following the tradition of other MPST works the global queue provides an 
abstraction for multiple FIFO queues, each queue being between two endpoints 
(cf. TCP) with no global ordering. Therefore m;-m,; can be permuted to mj: mi 
in the global queue if the sender or the receiver differ. For example the following 
messages are permutable: (p, q, L(v))- (p, q',U(v)) if q £ g and (p, q, U(v)) (h, p)? 
and (p,q,l(v)) - (q, crash F}. But (Y, p)? - (p, crash FẸ is not permutable, both 
have the same sender and receiver (y is the sender of (p, crash FẸ). 


Basic Dynamic Semantics for Applications. Figure6 shows the operational 
semantics of applications. We use evaluation contexts as defined in Fig. 5. Con- 
text F is either a hole [ ], a default context t(£)h(H)?., or a recursion context 
def D in E. We write Efn] to denote the action statement obtained by filling the 
hole in E|-] with 7. 

Rule (Link) says that (local) processes who agree on shared name a, obeying 
to some protocol (global type), playing certain roles p; represented by a[p|(yi).P, 
together will start a private session s; this will result in replacing every variable 
y; in P; and, at the same time, creating a new global queue s : Ý, and appointing 
a coordinator G : (Ø, Ø), which is novel in our work. 

Rule (Snd) in Fig. 6 reduces a sending action q! l(e) by emitting a message 
(p,q, l(v)) to the global queue s : h. Rule (Rev) reduces a receiving action 
if the message arriving at its end is sent from the expected sender with an 
expected label. Rule (Rec) is for recursion. When the recursive body, defined 
inside n, is called by X (e) where e is evaluated to v, it reduces to the statement 
n{v/x} which will again implement the recursive body. Rule (Str) says that 
processes which are structurally congruent have the same reduction. Processes, 
applications, and systems are considered modulo structural congruence, denoted 
by =, along with a-renaming. Rule (Par) and (Str) together state that a parallel 
composition has a reduction if its sub-application can reduce. Rule (Sys) states 
that a system has a reduction if its application has a reduction, and (New) 
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says a reduction can proceed under a session. Rule (Crash) states that a process 
on channel s[p] can fail at any point in time. (Crash) also adds a notification 
(y, crash F} which is sent to y (the coordinator). This is an abstraction for 
the failure detector described in Sect. 2 (5), the notification (y, crash F) is the 
first such notification issued by a participant based on its local failure detector. 
Adding the notification into the global queue instead of making the coordinator 
immediately aware of it models that failures are only detected eventually. Note 
that a failure is not annotated with a level because failures transcend all levels, 
and asynchrony makes it impossible to identify “where” exactly they occurred. 
As a failure is permanent it can affect multiple try-handles. The (Crash) rule 
does not apply to participants which are robust, i.e., that conceptually cannot fail 
(e.g., dfs in Fig. 2). Rule (Crash) removes channel s[p] (the failed process) from 
application N, and removes messages and notifications delivered from, or heading 
to, the failed p by function remove(h, p). Function remove(h, p) returns a new 
queue after removing all regular messages and notifications that contain p, e.g., 
let h = (pe, p1,l(v)) - (ps, p2, l (v")} < (ps, p4,U(v')) - (pe, W)? > (p2, crash {ps }) - 
(a, po)? then remove(h, po) = (p3,pz,l/(v’)). Messages are removed to model 
that in a real system send/receive does not constitute an atomic action. 


Handling at Processes. Failure handling, defined in Fig. 7, is based on the obser- 
vations that (i) a process that fails stays down, and (ii) multiple processes 
can fail. As a consequence a failure can trigger multiple failure handlers either 
because these handlers are in different (subsequent) try-handles or because of 
additional failures. Therefore a process needs to retain the information of who 
failed. For simplicity we do not model state at processes, but instead processes 
read but do not remove failure notifications from the global queue. We define 
Fset(h, p) to return the union of failures for which there are notifications head- 
ing to p, i.e., (p,crash F}, issued by the coordinator in queue h up to the first 
done notification heading to p: 


Definition 2 (Union of Existing Failures Fset(h, p)) 


FU Fset(h', p) if h = (p,crash F} -h’ 
Fset(0,p) =0 Fset(h,p) = < Ø if h = (Y, p)? -W 
Fset(h’, p) otherwise 


In short, if the global queue is Ø, then naturally there are no failure notifications. 
If the global queue contains a failure notification sent from the coordinator, say 
(p,crash F), we collect the failure. If the global queue contains done notification 
(w, p)® sent from the coordinator then all participants in ¢ have finished their 
local actions, which implies that the try-handle ¢ can be completed. Our failure 
handling semantics, (TryHdl), allows a try-handle ¢ = (x, F) to handle different 
failures or sets of failures by allowing a try-handle to switch between different 
handlers. F thus denotes the current set of handled failures. For simplicity we 
refer to this as the current(ly handled) failure set. This is a slight abuse of 
terminology, done for brevity, as obviously failures are only detected with a 
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F’ =U{A | A€ dom(H) AFC AC Fset(h, p)} Fn €H 
s[p] : Elt(n)h(H)*).4] | s: hk slp] : Elt(n) hH) E.n] | 


(TryHdl) 
s:h 
s[p] : E[t(0)h(H)®.n] | s: h — slp] : Eit(0)h(H)?.n] | s: h- (p, Y)? (SndDone) 


(p,p Eh vDon 
Taa | s:k—siel: Eel e aaa an 


s[p] : Efn] | s: (4, p,I(v)) -h > sip]: Efn] | s:h lg labels(E[n]) (Cin) 


(bp) Eh $g Ejn] 
s[p]: Eln] | s: h > slp]: Elm] | s:h\ W, p)’ (CInDone) 


Fig. 7. Operational semantics of distributed applications, for endpoint handling. 


certain lag. The handling strategy for a process is to handle the—currently— 
largest set of failed processes that this process has been informed of and is 
able to handle. This largest set is calculated by U{A | A € dom(H) AF CAC 
Fset(h, p)}, that selects all failure sets which are larger than the current one (A € 
dom(H) A F C A) if they are also triggered by known failures (A C Fset(h, p)). 
Condition F”: n € H in (TryHdl) ensures that there exists a handler for F’. The 
following example shows how (TryHdl) is applied to switch handlers. 


Example 2. Take h such that Fset(h,p) = {pı} and H = {pi} : m, {po} : 
n2,{pP1,P2} : m2 in process P = s[p] : t(m,)h(H)% 4), which indicates that 
P is handling failure {p,}. Assume now one more failure occurs and results in a 
new queue h’ such that F'set(h’, p) = {p1, po}. By (TryHdl), the process acting at 
s[p] is handling the failure set {p1, p2} such that P = s[p] : t(m12)h(H) {P122} 
(also notice the 12 inside the try-block). A switch to only handling {p2} does 
not make sense, since, e.g., 72 can contain pı. Figure2 shows a case where the 
handling strategy differs according to the number of failures. 


In Sect. 3 we formally define well-formedness conditions, which guarantee that 
if there exist two handlers for two different handler signatures in a try-handle, 
then a handler exists for their union. The following example demonstrates why 
such a guarantee is needed. 


Example 8. Assume a slightly different P compared to the previous examples 
(no handler for the union of failures): P = s[p] : E[t(n)h(H)“®] with H = 
{pı} : m, {po}: 2. Assume also that Fset(h,p) = {p1, p2}. Here (TryHdl) will 
not apply since there is no failure handling for {p1, p2} in P. If we would allow 
a handler for either {pı} or {p2} to be triggered we would have no guarantee 
that other participants involved in this try-handle will all select the same failure 
set. Even with a deterministic selection, i.e., all participants in that try-handle 
selecting the same handling activity, there needs to be a handler with handler 
signature = {p1, p2} since it is possible that pı is involved in 7. Therefore the 
type system will ensure that there is a handler for {p1, p2} either at this level or 
at an outer level. 
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(I) explains that a process finishing its default action (P,) cannot leave its 
current try-handle («,0) immediately because other participants may fail (P, 
failed). Below Eq. 1 also shows this issue from the perspective of semantics: 


s[p] : t(0)h(F: g!l(10).g2U/(x)) ny! | sla] : (pUl). p (a! + 10))h(H)-*) 
| s: (q,crash F} - (p,crash F} -h 
(1) 


In Eq. 1 the process acting on s[p] ended its try-handle (i.e., the action is 0 in 
the try-block), and if s[p] finishes its try-handle the participant acting on s[q] 
which started handling F would be stuck. 

To solve the issue, we use (SndDone) and (RevDone) for completing a local 
try-handle with the help of a coordinator. The rule (SndDone) sends out a 
done notification (p,7~)% if the current action in ¢ is 0 and sets the action to 0, 
indicating that a done notification from the coordinator is needed for ending the 
try-handle. 

Assume process on channel s[p] finished its local actions in the try-block (i.e., 
as in Eq. 1 above), then by (SndDone), we have 


(1) > s:(q,crash F): (p, crash F} - (p, py) «hr | 
s[p] : t(O)h(F: g!l(10).g2U/ (1) P | sla] : (PUp (a! + 10))h(H) EP y" 


where notification (p, ~)”) is added to inform the coordinator. Now the process 
on channel s|p] can still handle failures defined in its handling environment. This 
is similar to the case described in (II). 

Rule (RevDone) is the counterpart of (SndDone). Once a process receives a 
done notification for ¢@ from the coordinator it can finish the try-handle ¢ and 
reduces to the continuation 7. Consider Eq. 2 below, which is similar to Eq. 1 but 
we take a case where the try-handle can be reduced with (RevDone). In Eq. 2 
(SndDone) is applied: 


s[p] :t(O)h(F':q!d(10).q20'(a)) nf | 
s[q] :t(O)h(F: p?l(a’).p!l'(a’ + 10))" yn” | sih (2) 


With h = (uv, q)) . (p,p) ® . (gq, crash F} - (p,crash F} both processes 
can apply (RcvDone) and safely terminate the try-handle (k,@). Note that 
Fset(h,p) = Fset(h,q) = 9 (by Definition 2), i.e., rule (TryHdl) can not be 
applied since a done notification suppresses the failure notification. Thus Eq. 2 
will reduce to: 


(2) +* s[p]: 7 | s[q]: 7” | s:(q,crash F} - (p,crash F) 


It is possible that n’ or 7” have handlers for F. Note that once a queue 
contains (7, p)*”), all non-failed process in the try-handle («, Ø) have sent done 
notifications to ~w (i.e. applied rule (SndDone)). The coordinator which will be 
introduced shortly ensures this. 


A Typing Discipline for Statically Verified Crash Failure Handling 813 


p=roles(G)\ F! F’=FU{p} m= (p,crash {p}) (F) 
G:(F,d)@N | s: (v,crash {p})-hoG:(F',d)@N | s:h-m 


d = E (p,w)* s (CollectDone) 
G : (F, d)*ès : (p, Y)? -hoG:(F,d)¢s:h 


roles(d,) D roles(G,¢)\ F WF’ € hdl(G,¢).(F’ Z F) 
G : (F,d)*ès : h — G : (F, remove(d, ¢))¢s : h - (a, roles(G, ¢) \ F) 


z (IssueDone) 


Fig. 8. Operational semantics for the coordinator. 


Rule (Cln) removes a normal message from the queue if the label in the mes- 
sage does not exist in the target process, which can happen when a failure handler 
was triggered. The function labels(7) returns all labels of receiving actions in 7 
which are able to receive messages now or possible later. This removal based 
on the syntactic process is safe because in a global type separate branch types 
not defined in the same default block or handler body must have disjoint sets of 
labels (c.f., Sect. 3). Let ¢ € P if try-handle ¢ appears inside P. Rule (CInDone) 
removes a done notification of @ from the queue if no try-handle ¢ exists, which 
can happen in case of nesting when a handler of an outer try-handle is triggered. 


Handling at Coordinator. Figure8 defines the semantics of the coordinator. We 
firstly give the auxiliary definition of roles(G) which gives the set of all roles 
appearing in G. 

In rule (F), F represents the failures that the coordinator is aware of. This rule 
states that the coordinator collects and removes a failure notification (y, crash p) 
heading to it, retains this notification by G : (F’,d), F” = FU {p}, and issues 
failure notifications to all non-failed participants. 

Rules (CollectDone, IssueDone), in short inform all participants in ¢ = 
(K, F) to finish their try-handle ¢ if the coordinator has received sufficient done 
notifications of ¢ and did not send out failure notifications that interrupt the task 
(K, F) (e.g. see (III)). Rule (CollectDone) collects done notifications, i.e., (p, v)?, 
from the queue and retains these notification; they are used in (IssueDone). For 
introducing (IssueDone), we first introduce hdl(G, (K, F)) to return a set of 
handler signatures which can be triggered with respect to the current handler: 


Definition 3. hdl(G,(«, F)) = dom(H) \ P(F) if t(Go)h(H)* € G where P(F) 
represents a powerset of F. 


Also, we abuse the function roles to collect the non-coordinator roles of ¢ in 
d, written roles(d, 6); similarly, we write roles(G,¢) where ¢ = (x, F) to collect 
the roles appearing in the handler body F in the try-handle of k in G. Remember 
that d only contains done notifications sent by participants. 

Rule (IssueDone) is applied for some œ when conditions VF” € 
hdl(G, ¢).(F’ Z F) and roles(d, ¢) 2 roles(G, ¢) \ F are both satisfied, where F 
contains all failures the coordinator is aware of. Intuitively, these two conditions 
ensure that (1) the coordinator only issues done notifications to the participants 
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in the try-handle ¢ if it did not send failure notifications which will trigger a 
handler of the try-handle ¢; (2) the coordinator has received all done notifica- 
tions from all non-failed participants of ¢. We further explain both conditions 
in the following examples, starting from condition YF” € hdl(G,¢).(F’ Z F), 
which ensures no handler in ¢ can be triggered based on the failure notifications 
F sent out by the coordinator. 


Example 4. Assume a process playing role p; is P; = s[p;] : t(n)h(H;)?. Where 
i € {1,2,3} and H; = {po}: ni2, {ps}: nis, {pe, p3} : M23 and the coordinator 
is G : ({po, p3},d) where t(...)h(H)* € G and dom(H) = dom(H;) for any 
i € {1,2,3} and d = (p,,p)tP2)) . (py, yp) ®@{P223}) . d'. For any ¢ in d, the 
coordinator checks if it has issued any failure notification that can possibly 
trigger a new handler of ¢: 


1. For ¢ = (k, {p2}) the coordinator issued failure notifications that can inter- 
rupt a handler since 


hdl(G, (K, {pa})) = dom(H) \ P({p2}) = {{ps}, {p2, pst} 


and {p2, p3} C {p2, p3}. That means the failure notifications issued by the 
coordinator, i.e., {p2, p3}, can trigger the handler with signature {p2, p3}. 
Thus the coordinator will not issue done notifications for ¢ = (k,{p2}). A 
similar case is visualized in Fig. 4 at time (2). 

2. For é = (r, {p2, p3}) the coordinator did not issue failure notifications that 
can interrupt a handler since 


hdl(G, (K, {p2, p3})) = dom(H) \ P({p2, ps}) = 0 


so that VF" € hdl(G,(K, {p2,p3}))-(F” Z {p2,p3}) is true. The coordinator 
will issue done notifications for ¢ = (K, {p2, p3}). 


Another condition roles(d,¢) D roles(G,¢) \ F states that only when the 
coordinator sees sufficient done notifications (in d) for ¢, it issues done notifi- 
cations to all non-failed participants in 4, i.e., (y, roles(G, ¢) \ F)%. Recall that 
roles(d, ġ) returns all roles which have sent a done notification for the handling of 
@ and roles(G, ġ) returns all roles involving in the handling of ¢. Intuitively one 
might expect the condition to be roles(d, ¢) = roles(G, ¢); the following example 


shows why this would be wrong. 


Example 5. Consider a process P acting on channel s[p] and {q} ¢ dom(H): 
P =s[p] : t(..-t(...)h({q}:n, H) .1'h(H)? 
Assume P has already reduced to: 
P = sip] : t(0)h(H)® 


We show why roles(d, ġ) 2 roles(G, ¢) \ F is necessary. We start with the simple 
cases and then move to the more involving ones. 
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Fig. 9. The grammar of local types. 


(a) Assume q did not fail, the coordinator is G : (Ø, d), and all roles in ¢ issued 
a done notification. Then roles(d, ¢) = roles(G,¢) and F = 9. 

(b) Assume q failed in the try-handle ¢’, the coordinator is G : ({q},d), and 
all roles except q in ¢ issued a done notification. roles(d,é) 4 roles(G, ¢) 
however roles(d,@) = roles(G,) \ {q}. Cases like this are the reason why 
(IlssueDone) only requires done notifications from non-failed roles. 

(c) Assume q failed after it has issued a done notification for ¢ (i.e., q finished 
try-handle ¢’) and the coordinator collected it (by (CollectDone)), so we 
have G : ({q},d) and q € roles(d,¢). Then roles(d,@) D roles(G, ¢) \ {q}. 
i.e. (IssueDone) needs to consider done notifications from failed roles. 


Thus rule (IssueDone) has the condition roles(d, ¢) > roles(G, ¢) \ F because of 
cases like (b) and (c). 


The interplay between issuing of done notification (IssueDone) and issuing 
of failure notifications (F) is non-trivial. The following proposition clarifies that 
the participants in the same try-handle ¢ will never get confused with handling 
failures or completing the try-handle ¢. 


Proposition 1. Given s: h with h = h'- (p,p)? - h” and Fset(h,p) 4 0, the 
rule (TryHdl) is not applicable for the try-handle ġ at the process playing role p. 


5 Local Types 


Figure 9 defines local types for typing behaviors of endpoint processes with failure 
handling. Type p! is the primitive for a sending type, and p? is the primitive for 
a receiving type, derived from global type p > q{li(S;).Gi}ier by projection. 
Others correspond straightforwardly to process terms. Note that type end only 
appears in runtime type checking. Below we define G[p to project a global type 
G on p, thus generating p’s local type. 


Definition 4 (Projection). Consider a well-formed top-level global type [q]G. 
Then Gfp is defined as follows: 


(1) Glp where G = t(Go)h(F1:Gi,..., Fn:Gn)".G’ = 
t(Golp)h(F,:Gilp, ..-, Fr:Gnip)".G’tp if p € roles(G) 
G'lp otherwise 
pal{li(Si).Gilp}ier if p = pı 
(2) pi > po{li(Si)-Gi}ierlp = 4 pi? {li(Si).Gilp}ier if p = p2 
Galp if Vi, j € I.Gilp = Gilp 


816 M. Viering et al. 


(3) (ut.G)[p = ut.(Gîp) if At(G’)h(A) € G and Glp Æ t for any t' 
(4) tlp =t (5) endìp = end 


Otherwise it is undefined. 


The main rule is (1): if p appears somewhere in the target try-handle global 
type then the endpoint type has a try-handle annotated with « and the default 
logic (i.e., F = Ø). Note that even if Go[p = end the endpoint still gets such a 
try-handle because it needs to be ready for (possible) failure handling; if p does 
not appear anywhere in the target try-handle global type, then the projection 
skips to the continuation. 

Rule (2) produces local types for interaction endpoints. If the endpoint is a 
sender (i.e., p = pı), then its local type abstracts that it will send something from 
one of the possible internal choices defined in {1;(S;)};c7 to p2, then continue as 
G;|p, gained from the projection, if k € I is chosen. If the endpoint is a receiver 
(i.e., p = p2), then its local type abstracts that it will receive something from 
one of the possible external choices defined in {l;(S;)}ier sent by pı; the rest is 
similarly as for the sender. However, if p is not in this interaction, then its local 
type starts from the next interaction which p is in; moreover, because p does 
not know what choice that pı has made, every path G;lp lead by branch l; shall 
be the same for p to ensure that interactions are consistent. For example, in 
G = pı > po{li($i).p3 > pi 13(S), l2(S2).p3 > pi l4(S)}, interaction p3 > pı 
continues after pı — pə takes place. If l3 Æ l4, then G is not projectable for p3 
because p3 does not know which branch that pı has chosen; if pı chose branch 
lı, but p3 (blindly) sends out label l4 to pı, for pı it is a mistake (but it is 
not a mistake for p3) because pı is expecting to receive label l3. To prevent 
such inconsistencies, we adopt the projection algorithm proposed in [24]. Other 
session type works [17,39] provide ways to weaken the classical restriction on 
projection of branching which we use. 

Rule (3) forbids a try-handle to appear in a recursive body, e.g., pt. 
t(G)h(F': t)".G is not allowed, but t(ut.G)h(H)" and t(G)h(F : ut.G’, H)" are 
allowed. This is because « is used to avoid confusion of messages from different 
try-handles. If a recursive body contains a try-handle, we have to dynamically 
generate different levels to maintain interaction consistency, so static type check- 
ing does not suffice. We are investigating alternative runtime checking mecha- 
nisms, but this is beyond the scope of this paper. Other rules are straightforward. 


Example 6. Recall the global type G from Fig. 2 in Sect. 1. Applying projection 
rules defined in Definition 4 to G on every role in G we obtain the following: 


Tas = Gldfs = t(pt-wz'la, (S).we!la,(S).wz 2, (S').we 2lr, (9’).t)h(Hafs) E9 
Hays = {wi}: ut waly (S).we tly, (9).U, 
{w2}: ut" .w U, (S).w1 2, (S’).t", {w1, w2} :end 
Tw, = Glw; = t(ut.dfs?la, (S).dfs!l,, (S’).t)h(H.,, ) 
Hw, = {w1 }:end, {w2}: ut’. dfs?U, (S ).dfall, (S").t’, {wz, wo}:end 
Tw, = Glwe = t(ut.dfs?la, (S).dfs!l,.(S’).t)h(Hy a 0) 
Huws = {w1}: ut” .dfs?l4 (S)-dfs!lp, (9). y ,{we}:end, {w:, we}:end 
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rF a:(G) kel Tre: S% 
Tt Peo {c: Gp} CF cine {c: Tk} O aay 
Te alp|.P > (i) De c: p! Ie (e)k > {c: p! {li(Si).Ti hier} E ini/T snd] 
Wiel. T, x: S9; F cimo {c:T;} T-rcv] 
I F c:p? {lulzi) myer & {c: p? {li(Si).Ti fier} - 
A end-only Ir H c:nb {c: end} as 
rre:00 A T F ¢:0.b {c: end.end} [T-0/T-ya] 
IF e: bool 
Vie {1,2}. r F cimo A rees , 
TF citem edeme A TX:9T c: Xlr {c:r} Pel 
T,X: Stx: S H c:md {c:T'} 
DX: .T’ s : 
; SutT'FH c m > {c: T} T-def | 
I F c:def X(x) =m inne {c:T} j 
IF cine {c:T} T F ein o {c:T'} dom(H) = dom(H) 
YF € dom(H). I H c:H(F)> {c: H(F)} ET 


T F c:t(n)h(H) n > {c:t(T)h(H)?.T’} 


Fig. 10. Typing rules for processes 


6 Type System 


Next we introduce our type system for typing processes. Figures 10 and 11 present 
typing rules for endpoints processes, and typing judgments for applications and 
systems respectively. 

We define shared environments I’ to keep information on variables and the 
coordinator, and session environments A to keep information on endpoint types: 


P :=0|T,X:ST|T,x:S|T,a:G|T, Y A:=@|A,c:T|A,s:h 
m ::= (p,q,1(S)) | (p, crash F} | (p,q)? h:=Q@|h-m 


T maps process variables X and content variables x to their types, shared names 
a to global types G, and a coordinator ¥ = G : (F,d) to failures and done 
notifications it has observed. A maps session channels c to local types and session 
queues to queue types. We write T, I” = TUI” when dom(I’)Ndom(I”’) = 9; same 
for A, A’. Queue types h are composed of message types m. Their permutation is 
defined analogously to the permutation for messages. The typing judgment for 
local processes lH Pp A states that process P is well-typed by A under I. 

Since we do not define sequential composition for processes, our type sys- 
tem implicitly forbids session interleaving by |T-ini]. This is different from 
other session type works [15,24], where session interleaving is prohibited for the 
progress property; here the restriction is inherent to the type system. 

Figure 10 lists our typing rules for endpoint processes. Rule |T-ini]| says 
that if a process’s set of actions is well-typed by G[p on some c, this process can 
play role p in a, which claims to have interactions obeying behaviors defined in 
G. (G) means that G is closed, i.e., devoid of type variables. This rule forbids 
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Tbk s:ho {s:h} FF e:S 


BE s:0> (8:0) Pes htale) > iapa y O/T) 
(pisp2) € {(p, 4) (9 p)} DE sihe {s:b} al 
TH s:h-(pi,p2)® > {s:h- (pi, p2)°} f 
pE{qg Y} m = (p, crash F} rH- N> A TF ND 42 
[Tr s:hpo {s:h} dom(Ai) N dom( 42) = 0 oe 
UF s:h-(p,crash F) > {s:h-m} FF M | Noo Ai, As |T-F/T-pa] 
Tr Sopa 
T H As coherent I’=r,v rA- N> A Sii 
PE (vs)S> A\4d; I"t wend A |T-s/T-sys] 


Fig. 11. Typing rules for applications and systems. 


alp].b[q].P because a process can only use one session channel. Rule |T-snd| 
states that an action for sending is well-typed to a sending type if the label and 
the type of the content are expected; |T-rcv]| states that an action for branching 
(i.e., for receiving) is well-typed to a branching type if all labels and the types of 
contents are as expected. Their follow-up actions shall also be well-typed. Rule 
|T-0] types an idle process. Predicate end-only A is defined as stating whether 
all endpoints in A have type end: 


Definition 5 (End-only A). We say A is end-only if and only if Vs[p] € 
dom( A), A(s[p]) = end. 


Rule |T-yd| types yielding actions, which only appear at runtime. Rule |T-if| 
is standard in the sense that the process is well-typed by A if e has boolean type 
and its sub-processes (i.e., 7 and 72) are well-typed by A. Rules |T-var ,T-def | 
are based on a recent summary of MPSTs [14]. Note that |T-def| forbids the 
type pt.t. Rule |T-th]| states that a try-handle is well-typed if it is annotated 
with the expected level ¢, its default statement is well-typed, H and H have the 
same handler signatures, and all handling actions are well-typed. 

Figure 11 shows typing rules for applications and systems. Rule |T-@| types 
an empty queue. Rules |T-m,T-D,T-F'| simply type messages based on their 
shapes. Rule |T-pa| says two applications composed in parallel are well-typed 
if they do not share any session channel. Rule |T-s| says a part of a system S 
can start a private session, say s, if S is well-typed according toa FA A, that 
is coherent (defined shortly). The system (vs)S with a part becoming private in 
s is well-typed to A\ Ag, that is, A after removing Ag. 


Definition 6 (A Session Environment Having s Only: A,) 
A, = {s|p] : T | s[p] E€ dom(A)} U{s: h | s E€ dom(A)} 


Rule |T-sys| says that a system Y N is well-typed if application N is well- 
typed and there exists a coordinator W for handling this application. We say 
I H Ais coherent under I if the local types of all endpoints are dual to each 
other after their local types are updated because of messages or notifications in 
s:h. 
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Coherence. We say that a session environment is coherent if, at any time, given 
a session with its latest messages and notifications, every endpoint participating 
in it is able to find someone to interact with (i.e., its dual party exists) right 
now or afterwards. 


Example 7. Continuing with Example 6 — the session environment F F A is 
coherent even if wg will not receive any message from dfs at this point. The only 
possible action to take in A is that dfs sends out a message to w,;. When this 
action fires, A is reduced to A’ under a coordinator. (The reduction relation 
rH Aap I’t A’, where I = Ip, W and I” = Ip, Y’, is defined based on 
the rules of operational semantics of applications in Sect. 4, Figs.6 and 7). In 
A’, which abstracts the environment when dfs sends a message to w1, wg will 
be able to receive this message. 


A = s|dfs] : Tays, s[w1] : Tw,, slw2] : Tw.,5: Ø 
A’ = s[dfs] : t(we!lg,(S). wy 2l,, (S’). we Pr, (S’).T)h(H)O), 
s[wi]: Tw,, s[we]: Tw,,s: (dfs, w1,la,(S)) 
where T = pt.wy!la,(S).we!la,(S).wyz 21, (S").we ?l,, (S").t 


We write s[p] : T œ% s[q] : T” to state that actions of the two types are dual: 
Definition 7 (Duality). We define s[p] : T >< s[q] : T’ as follows: 


s[p] : end œx s[q] : end s[p] : end x s[q] : end s[p] : end > s[q] : end 
s[p] : T œx s[q] : T” 
sp] : ut.T œ sq] : ut. T' 
Vi € I. s|p] : T; x sļq] : T; 

s[p] : q! {li(Si)-Ti}ier r sla] : p? {1:(9i).T; Jier 
slp] : Ti slg]: T2 s[p]: Ti œx s[q]: T} dom(Hı) = dom(H2) 
VF € dom(H1). s|p] : Hi(F) & s[q] : Ho(F) 
s[p] : t(Ti)h(H1)?.T, ra sfa] : t(T2)h(H2)?.Ty 


s[p] : end œx s[q] : end sfp] : tx s[q] : t 


Operation T | p is to filter T to get the partial type which only contains 
actions of p. For example, p,!I/(S").pa!l(S) | po = poll(S) and p HT, To} | 
m = p2?l(S) where Tı = 1,(S1).p2?l(S) and To = Ig(S2).p2?l(S). Next 
we define (h)p—q to filter h to generate (1) the normal message types sent 
from p heading to q, and (2) the notifications heading to q. For example 
((p, q,1(S)) - (q, crash F) - (p, q)? - (p, crash F))p+q = p2l(S)- (F ) - (W)?. The 
message types are abbreviated to contain only necessary information. 

We define T—ht to mean the effect of ht on T. Its concept is similar to the 
session remainder defined in [35], which returns new local types of participants 
after participants consume messages from the global queue. Since failure notifica- 
tions will not be consumed in our system, and we only have to observe the change 
of a participant’s type after receiving or being triggered by some message types 
in ht, we say that T—ht represents the effect of ht on T. The behaviors follows 
our operational semantics of applications and systems defined in Figs. 6, 7, and 8. 
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For example t(q?{l;(9;).Ti bier) h(H)?.T’ —q2ly, (Sy) ht = t(T,)h(H)?.7’ —bt 
where k € I. 
Now we define what it means for A to be coherent under I: 


Definition 8 (Coherence). H A coherent if the following conditions hold: 


1. Ifs:he A, then 4G: (F,d) € I and {p | s[p] E€ dom(A)} C roles(G) and G 
is well-formed and Vp € roles(G), Gl p is defined. 
2. Vs[p] : T,s[q] : T” € A we have s[p] : T | q—(h) gp ™s[¢] : T' | p—(h) pg. 


In condition 1, we require a coordinator for every session so that when a failure 
occurs, the coordinator can announce failure notifications to ask participants to 
handle the failure. Condition 2 requires that, for any two endpoints, say s[p] 
and s[q], in A, equation s[p] : T | gq—(h)g+p < s[g] : T” | p—(h)p—q, must hold. 
This condition asserts that interactions of non-failed endpoints are dual to each 
other after the effect of h; while failed endpoints are removed from A, thus the 
condition is satisfied immediately. 


7 Properties 


We show that our type system ensures properties of subject congruence, sub- 
ject reduction, and progress. All auxiliary definitions and proofs are in the long 
version [43]. 

The property of subject congruence states that if S (a system containing an 
application and a coordinator) is well-typed by some session environment, then 
a S’ that is structurally congruent to it is also well-typed by the same session 
environment: 


Theorem 1 (Subject Congruence). [+ Sp AandS = S’ imply [+ 
S'> A. 

Subject reduction states that a well-typed S (coherent session environment 
respectively) is always well-typed (coherent respectively) after reduction: 
Theorem 2 (Subject Reduction) 

- It S> Awith I+ A coherent and S —* S’ imply that 34’, I” such that 


"+ S> A andr E Ach IH A or A= A and I’+ A’ coherent. 
- H Sp and S—>* S' imply that I’+ S'œ Ø for some I”. 


We allow sessions to run in parallel at the top level, e.g., S = (vsı)( Yı è N1) | 

. | YSn)( Pn Nn). Assume we have S with al[p].P € S. If we cannot apply 

rule (Link), S cannot reduce. To prevent this kind of situation, we require S to 
be initializable such that, Va[p].P € S, (Link) is applicable. 

The following property states that S never gets stuck (property of progress): 


Theorem 3 (Progress). If f+ Sœ Ø and S is initializable, then either 
S —* S’ and S’ is initializable or S' = Yes: h|... | V'es : h’ and A,..., h’ 
only contain failure notifications sent by coordinators and messages heading to 
failed participants. 
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After all processes in S terminate, failure notifications sent by coordinators are 
left; thus the final system can be of the form ¥@s:h |... | Ws’: h’, where 
h,...,h’ have failure notifications sent by coordinators and thus reduction rules 
(CollectDone), (IssueDone), and (F) will not be applied. 


Minimality. The following proposition points out that, when all roles defined in a 
global type, say G, are robust, then the application obeying to G will never have 
interaction with a coordinator (i.e., interactions of the application are equivalent 
to those without a coordinator). This is an important property, as it states that 
our model does not incur coordination overhead when all participants are robust, 
or in failure-agnostic contexts as considered in previous MPST works. 


Proposition 2. Assume Vp € roles(G) = {p1,...,Pn}, p is robust and P; = 
s[pi] : m for i € {1..n} and S = (v s)(W4P,|...|P,|s : h) where P;,i € {1..n} 
contains no try-handle. Then we have TH Sœ Ø and whenever S —* S’ we 
have We S', W=G: (0,0). 


Proof. Immediately by typing rules |T-ini,T-s,T-sys]|, Definition 4 (Projec- 
tion), and the operational semantics defined in Figs.6, 7, and 8. 


8 Related Work 


Several session type works study exception handling [7,9,16,30]. However, to the 
best of our knowledge this is the first theoretical work to develop a formalism 
and typing discipline for the coordinator-based model of crash failure handling 
in practical asynchronous distributed systems. 

Structured interactional exceptions [7] study exception handling for binary 
sessions. The work extends session types with a try-catch construct and a throw 
instruction, allowing participants to raise runtime exceptions. Global escape [6] 
extends previous works on exception handling in binary session types to MPSTs. 
It supports nesting and sequencing of try-catch blocks with restrictions. Reduc- 
tion rules for exception handling are of the form X + P — X” + P’, where X is 
the exception environment. This central environment at the core of the semantics 
is updated synchronously and atomically. Furthermore, the reduction of a try- 
catch block to its continuation is done in a synchronous reduction step involving 
all participants in a block. Lastly this work can only handle exceptions, i.e., 
explicitly raised application-level failures. These do not affect communication 
channels [6], unlike participant crashes. 

Similarly, our previous work [13] only deals with exceptions. An interaction 
p— q: SV F defines that p can send a message of type S to q. If F is not 
empty then instead of sending a message p can throw F. If a failure is thrown 
only participants that have casual dependencies to that failure are involved in 
the failure handling. No concurrent failures are allowed therefore all interactions 
which can raise failures are executed in a lock step fashion. As a consequence, 
the model can not be used to deal with crash failures. 
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Adameit et al. [1] propose session types for link failures, which extend session 
types with an optional block which surrounds a process and contains default 
values. The default values are used if a link failure occurs. In contrast to our 
work, the communication model is overall synchronous whereas our model is 
asynchronous; the optional block returns default values in case of a failure but 
it is still the task of the developer to do something useful with it. 

Demangeon et al. study interrupts in MPSTs [16]. This work introduces an 
interruptible block {|G|}°( by r}; G” identified by c; here the protocol G can be 
interrupted by a message l from r and is continued by G” after either a normal 
or an interrupted completion of G. Interrupts are more a control flow instruction 
like exceptions than an actual failure handling construct, and the semantics can 
not model participant crashes. 

Neykova and Yoshida [36] show that MPSTs can be used to calculate safe 
global states for a safe recovery in Erlang’s let it crash model [2]. That work 
is well suited for recovery of lightweight processes in an actor setting. However, 
while it allows for elaborate failure handling by connecting (endpoint) processes 
with runtime monitors, the model does not address the fault tolerance of runtime 
monitors themselves. As monitors can be interacting in complex manners repli- 
cation does not seem straightforwardly applicable, at least not without poten- 
tially hampering performance (just as with straightforward replication of entire 
applications). 

Failure handling is studied in several process calculi and communication- 
centered programming languages without typing discipline. The conversation 
calculus [42] models exception behavior in abstract service-based systems with 
message-passing based communication. The work does not use channel types but 
studies the behavioral theory of bisimilarity. Error recovery is also studied in a 
concurrent object setting [45]; interacting objects are grouped into coordinated 
atomic actions (CAs) which enable safe error recovery. CAs can however not 
be nested. PSYNC [18] is a domain specific language based on the heard-of 
model of distributed computing [12]. Programs written in PSYNC are structured 
into rounds which are executed in a lock step manner. PSYNC comes with 
a state-based verification engine which enables checking of safety and liveness 
properties; for that programmers have to define non-trivial inductive invariants 
and ranking functions. In contrast to the coordinator model, the heard-of model 
is not widely deployed in practice. Verdi [44] is a framework for implementing 
and verifying distributed systems in Coq. It provides the possibility to verify 
the system against different network models. Verdi enables the verification of 
properties in an idealized fault model and then transfers the guarantees to more 
realistic fault models by applying transformation functions. Verdi supports safety 
properties but no liveness properties. 


9 Final Remarks 


Implementation. Based on our presented calculus we developed a domain-specific 
language and corresponding runtime system in Scala, using ZooKeeper as the 
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coordinator. Specifically our implementation provides mechanisms for (1) inter- 
acting with ZooKeeper as coordinator, (2) done and failure notification delivery 
and routing, (3) practical failure detection and dealing with false suspicions and 
(4) automatically inferring try-handle levels. 


Conclusions. This work introduces a formal model of verified crash failure han- 
dling featuring a lightweight coordinator as common in many real-life systems. 
The model carefully exposes potential problems that may arise in distributed 
applications due to partial failures, such as inconsistent endpoint behaviors and 
orphan messages. Our typing discipline addresses these challenges by building on 
the mechanisms of MPSTs, e.g., global type well-formedness for sound failure 
handling specifications, modeling asynchronous permutations between regular 
messages and failure notifications in sessions, and the type-directed mechanisms 
for determining correct and orphaned messages in the event of failure. We adapt 
coherence of session typing environments (i.e., endpoint consistency) to con- 
sider failed roles and orphan messages, and show that our type system statically 
ensures subject reduction and progress in the presence of failures. 


Future Work. We plan to expand our implementation and develop further appli- 
cations. We believe dynamic role participation and role parameterization would 
be valuable for failure handling. Also, we are investigating options to enable 
addressing the coordinator as part of the protocol so that pertinent runtime 
information can be persisted by the coordinator. We plan to add support to our 
language and calculus for solving various explicit agreement tasks (e.g., consen- 
sus, atomic commit) via the coordinator. 
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Abstract. This work exploits the logical foundation of session types to 
determine what kind of type discipline for the a-calculus can exactly 
capture, and is captured by, A-calculus behaviours. Leveraging the proof 
theoretic content of the soundness and completeness of sequent calculus 
and natural deduction presentations of linear logic, we develop the first 
mutually inverse and fully abstract processes-as-functions and functions- 
as-processes encodings between a polymorphic session z-calculus and a 
linear formulation of System F. We are then able to derive results of 
the session calculus from the theory of the A-calculus: (1) we obtain 
a characterisation of inductive and coinductive session types via their 
algebraic representations in System F; and (2) we extend our results to 
account for value and process passing, entailing strong normalisation. 


1 Introduction 


Dating back to Milner’s seminal work [29], encodings of A-calculus into 7-calculus 
are seen as essential benchmarks to examine expressiveness of various extensions 
of the z-calculus. Milner’s original motivation was to demonstrate the power of 
link mobility by decomposing higher-order computations into pure name pass- 
ing. Another goal was to analyse functional behaviours in a broad computa- 
tional universe of concurrency and non-determinism. While operationally cor- 
rect encodings of many higher-order constructs exist, it is challenging to obtain 
encodings that are precise wrt behavioural equivalence: the semantic distance 
between the A-calculus and the z-calculus typically requires either restricting 
process behaviours [45] (e.g. via typed equivalences [5]) or enriching the à- 
calculus with constants that allow for a suitable characterisation of the term 
equivalence induced by the behavioural equivalence on processes [43]. 

Encodings in a-calculi also gave rise to new typing disciplines: Session types 
[20,22], a typing system that is able to ensure deadlock-freedom for commu- 
nication protocols between two or more parties [23], were originally motivated 
“from process encodings of various data structures in an asynchronous version of 
the z-calculus” [21]. Recently, a propositions-as-types correspondence between 
linear logic and session types [8,9,54] has produced several new developments 
© The Author(s) 2018 
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and logically-motivated techniques [7,26,49,54] to augment both the theory and 
practice of session-based message-passing concurrency. Notably, parametric ses- 
sion polymorphism [7] (in the sense of Reynolds [41]) has been proposed and a 
corresponding abstraction theorem has been shown. 

Our work expands upon the proof theoretic consequences of this proposi- 
tions-as-types correspondence to address the problem of how to exactly match 
the behaviours induced by session z-calculus encodings of the A-calculus with 
those of the A-calculus. We develop mutually inverse and fully abstract encodings 
(up to typed observational congruences) between a polymorphic session-typed 
m-calculus and the polymorphic A-calculus. The encodings arise from the proof 
theoretic content of the equivalence between sequent calculus (i.e. the session 
calculus) and natural deduction (i.e. the A-calculus) for second-order intuitionis- 
tic linear logic, greatly generalising [49]. While fully abstract encodings between 
A-calculi and 7-calculi have been proposed (e.g. [5,43]), our work is the first to 
consider a two-way, both mutually inverse and fully abstract embedding between 
the two calculi by crucially exploiting the linear logic-based session discipline. 
This also sheds some definitive light on the nature of concurrency in the (log- 
ical) session calculi, which exhibit “don’t care” forms of non-determinism (e.g. 
processes may race on stateless replicated servers) rather than “don’t know” 
non-determinism (which requires less harmonious logical features [2]). 

In the spirit of Gentzen [14], we use our encodings as a tool to study non- 
trivial properties of the session calculus, deriving them from results in the à- 
calculus: We show the existence of inductive and coinductive sessions in the poly- 
morphic session calculus by considering the representation of initial F-algebras 
and final F'-coalgebras [28] in the polymorphic »-calculus [1,19] (in a linear set- 
ting [6]). By appealing to full abstraction, we are able to derive processes that 
satisfy the necessary algebraic properties and thus form adequate uniform rep- 
resentations of inductive and coinductive session types. The derived algebraic 
properties enable us to reason about standard data structure examples, provid- 
ing a logical justification to typed variations of the representations in [30]. 

We systematically extend our results to a session calculus with A-term and 
process passing (the latter being the core calculus of [50], inspired by Benton’s 
LNL [4]). By showing that our encodings naturally adapt to this setting, we 
prove that it is possible to encode higher-order process passing in the first- 
order session calculus fully abstractly, providing a typed and proof-theoretically 
justified re-envisioning of Sangiorgi’s encodings of higher-order z-calculus [46]. 
In addition, the encoding instantly provides a strong normalisation property of 
the higher-order session calculus. 

Contributions and the outline of our paper are as follows: 


§ 3.1 develops a functions-as-processes encoding of a linear formulation 
of System F, Linear-F, using a logically motivated polymorphic session m- 
calculus, Polyz, and shows that the encoding is operationally sound and 
complete. 
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§ 3.2 develops a processes-as-functions encoding of Polyz into Linear-F, aris- 
ing from the completeness of the sequent calculus wrt natural deduction, also 
operationally sound and complete. 

§ 3.3 studies the relationship between the two encodings, establishing they 
are mutually inverse and fully abstract wrt typed congruence, the first two- 
way embedding satisfying both properties. 

§ 4 develops a faithful representation of inductive and coinductive session 
types in Polyz via the encoding of initial and final (co)algebras in the poly- 
morphic A-calculus. We demonstrate a use of these algebraic properties via 
examples. 

§ 4.2 and 4.3 study term-passing and process-passing session calculi, extend- 
ing our encodings to provide embeddings into the first-order session calculus. 
We show full abstraction and mutual inversion results, and derive strong nor- 
malisation of the higher-order session calculus from the encoding. 


In order to introduce our encodings, we first overview Polyz, its typing system 
and behavioural equivalence (§ 2). We discuss related work and conclude with 
future work (§ 5). Detailed proofs can be found in [52]. 


2 Polymorphic Session z-Calculus 


This section summarises the polymorphic session z-calculus [7], dubbed Polyz, 
arising as a process assignment to second-order linear logic [15], its typing system 
and behavioural equivalences. 


2.1 Processes and Typing 


Syntax. Given an infinite set A of names z, y, z,u,v, the grammar of processes 
P,Q, R and session types A, B,C is defined by: 


P,Q, R u=ax(y).P |a(y).P |P|Q |(vy)P |k] 10 
| «&(A).P | a2(¥Y).P | «.inl; P | x.inr; P | x.case(P, Q) | !a(y).P 
A,B :=1|A~B|ASB|AXKB|AGB|!IA|VX.A|AX.A|X 


x(y).P denotes the output of channel y on x with continuation process P; x(y).P 
denotes an input along xz, bound to y in P; P | Q denotes parallel composition; 
(vy)P denotes the restriction of name y to the scope of P; 0 denotes the inactive 
process; [a + y] denotes the linking of the two channels x and y (implemented 
as renaming); 7(A).P and «(Y).P denote the sending and receiving of a type A 
along x bound to Y in P of the receiver process; x.inl; P and x.inr; P denote the 
emission of a selection between the left or right branch of a receiver x.case(P, Q) 
process; !a(y).P denotes an input-guarded replication, that spawns replicas upon 
receiving an input along x. We often abbreviate (vy)x(y).P to Z(y).P and omit 
trailing 0 processes. By convention, we range over linear channels with x,y,z 
and shared channels with u,v, w. 
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(out) o (in) (outT) o (inT) 
aly).P 2%, p 2(y).P 2s P{z/y} 2(A).P 24) p 2(v).P 2 PlB/y} 
(lout) (id) lopen) 


zin; P =", p (va)([z > y] | P) > P{Y/x} p2®,@9 
(lin) (rep) —— 
x.inl a(z) (vy)a(y) 
x.case(P,Q) “> P lz(y).P > P{z/y} |!a(y).P  (vy)P ——> Q 
(close) (par) (com) (res) 
P (vy)x(y) p Q z(y) Q' PS Q PĒ p REEN Q' PQ 
P|Q > (vy) (P'|Q) PIRSQ|R P|QSP'|Q (vy)P S (vy)Q 


Fig. 1. Labelled transition system. 


The syntax of session types is that of (intuitionistic) linear logic proposi- 
tions which are assigned to channels according to their usages in processes: 1 
denotes the type of a channel along which no further behaviour occurs; A — B 
denotes a session that waits to receive a channel of type A and will then pro- 
ceed as a session of type B; dually, A @ B denotes a session that sends a 
channel of type A and continues as B; A & B denotes a session that offers 
a choice between proceeding as behaviours A or B; A @ B denotes a session 
that internally chooses to continue as either A or B, signalling appropriately 
to the communicating partner; !A denotes a session offering an unbounded (but 
finite) number of behaviours of type A; VX.A denotes a polymorphic session that 
receives a type B and behaves uniformly as A{B/X}; dually, 1X.A denotes an 
existentially typed session, which emits a type B and behaves as A{B/X}. 


Operational Semantics. The operational semantics of our calculus is presented 
as a standard labelled transition system (Fig. 1) in the style of the early system 
for the -calculus [46]. 

In the remainder of this work we write = for a standard z-calculus structural 
congruence extended with the clause [x + y] = [y + a]. In order to streamline 
the presentation of observational equivalence [7,36], we write = for structural 
congruence extended with the so-called sharpened replication axioms [46], which 
capture basic equivalences of replicated processes (and are present in the proof 
dynamics of the exponential of linear logic). A transition P —, Q denotes that 
P may evolve to Q by performing the action represented by label a. An action 
a (@) requires a matching @ (a) in the environment to enable progress. Labels 
include: the silent internal action 7, output and bound output actions (x(y) and 
(vz)x(z)); input action z(y); the binary choice actions (.inl, x.inl, z.inr, and 
x.inr); and output and input actions of types (#(A) and 2(A)). 

The labelled transition relation is defined by the rules in Fig.1, subject to 
the side conditions: in rule (res), we require y ¢ fn(q); in rule (par), we require 
bn(a) N fn(R) = @; in rule (close), we require y ¢ fn(Q). We omit the symmetric 
versions of (par), (com), (lout), (lin), (close) and closure under a-conversion. We 


write p12 for the composition of relations p1, p2. We write — to stand for >=. 
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(oR) 2:0; A,c:Al P::2:B (aR) OTA P yA QTA FQ ikB 
Q; T; AF z(x).P :: z:A — B Q; T; Ay, A2 F (vx)z{y).(P | Q): z:48 B 
(YR) QR, X; T; AHF P:zA YL QE Btype Q; I; A, x:A{B/X}F P: z:C 
Q; T; AF z(X).P i: 2VX.A (vt) Q; T; A, £NX.AF «(B).P : 2:0 
jap Oe Q; T; AF P: z:A{B/X} a 2,X;0P;A,v:Abk Ps: 2:0 
= Q; T; A F 2(B).P :: z:3X.A oe Q;0;A,e:5dX.Ar £(X).P E zC 
f Qr; AF PxA Q;P;Ac x: AF Q: zC 
(id) (cut) 


Q; T; x:AF [ao z]: zA R; T; A1, Ask (vx)(P | Q): 2:0 


Fig. 2. Typing rules (abridged — see [52] for all rules). 


Weak transitions are defined as usual: we write => for the reflexive, transitive 
closure of + and —* for the transitive closure of 5. Given a 4 T, notation => 
stands for 4 and == stands for =>. 


Typing System. The typing rules of Polyz are given in Fig. 2, following [7]. 
The rules define the judgment 2; I; Al P :: z:A, denoting that process P offers 
a session of type A along channel z, using the linear sessions in A, (potentially) 
using the unrestricted or shared sessions in I’, with polymorphic type variables 
maintained in 2. We use a well-formedness judgment 2+ Atype which states 
that A is well-formed wrt the type variable environment 2 (i.e. fu(A) C Q). 
We often write T for the right-hand side typing z:A, - for the empty context 
and A, A’ for the union of contexts A and A’, only defined when A and A’ are 
disjoint. We write -F P :: T for -;-;-F P:: T. 

As in [8,9,36,54], the typing discipline enforces that channel outputs always 
have as object a fresh name, in the style of the internal mobility z-calculus [44]. 
We clarify a few of the key rules: Rule VR defines the meaning of (impredicative) 
universal quantification over session types, stating that a session of type VX.A 
inputs a type and then behaves uniformly as A; dually, to use such a session 
(rule VL), a process must output a type B which then warrants the use of the 
session as type A{B/X}. Rule —R captures session input, where a session of 
type A — B expects to receive a session of type A which will then be used to 
produce a session of type B. Dually, session output (rule @R) is achieved by 
producing a fresh session of type A (that uses a disjoint set of sessions to those 
of the continuation) and outputting the fresh session along z, which is then a 
session of type B. Linear composition is captured by rule cut which enables a 
process that offers a session x:A (using linear sessions in A;) to be composed with 
a process that uses that session (amongst others in 42) to offer z:C. As shown 
in [7], typing entails Subject Reduction, Global Progress, and Termination. 


Observational Equivalences. We briefly summarise the typed congruence and 
logical equivalence with polymorphism, giving rise to a suitable notion of rela- 
tional parametricity in the sense of Reynolds [41], defined as a contextual logical 
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relation on typed processes [7]. The logical relation is reminiscent of a typed 
bisimulation. However, extra care is needed to ensure well-foundedness due to 
impredicative type instantiation. As a consequence, the logical relation allows us 
to reason about process equivalences where type variables are not instantiated 
with the same, but rather related types. 


Typed Barbed Congruence (=). We use the typed contextual congruence 
from [7], which preserves observable actions, called barbs. Formally, barbed con- 
gruence, noted &, is the largest equivalence on well-typed processes that is 7- 
closed, barb preserving, and contextually closed under typed contexts; see [7,52] 
for the full definition. 


Logical Equivalence (~;). The definition of logical equivalence is no more than 
a typed contextual bisimulation with the following intuitive reading: given two 
open processes P and Q (i.e. processes with non-empty left-hand side typings), 
we define their equivalence by inductively closing out the context, composing 
with equivalent processes offering appropriately typed sessions. When processes 
are closed, we have a single distinguished session channel along which we can 
perform observations, and proceed inductively on the structure of the offered 
session type. We can then show that such an equivalence satisfies the necessary 
fundamental properties (Theorem 2.3). 

The logical relation is defined using the candidates technique of Girard [16]. 
In this setting, an equivalence candidate is a relation on typed processes satisfy- 
ing basic closure conditions: an equivalence candidate must be compatible with 
barbed congruence and closed under forward and converse reduction. 


Definition 2.1 (Equivalence Candidate). An equivalence candidate R at 
z:A and z:B, noted R :: z:A = B, is a binary relation on processes such that, 
for every (P,Q) €R:: z:A & B both -F P :: z:A and - F Q :: z:B hold, together 
with the following (we often write (P,Q) ER: z:A & Bas PRQ::z:A & B): 


1. F (PQER:=z48& B,- F PSP: 2:A,and- QQ ::2z:B then 
(P.QOVER: zA% B. 

2. If (P,Q) € R:: 2:4 & B then, for all Py such that - + Po :: z:A and Po => P, 
we have (P,Q) € R: 2:4 = B. Symmetrically for Q. 


To define the logical relation we rely on some auxiliary notation, pertaining 
to the treatment of type variables arising due to impredicative polymorphism. 
We write w : R to denote a mapping w that assigns a closed type to the type 
variables in 2. We write w(X) for the type mapped by w to variable X. Given 
two mappings w : (2 and w’ : (2, we define an equivalence candidate assignment n 
between w and w’ as a mapping of equivalence candidate n(X) :: —:w(X) w (X) 
to the type variables in 2, where the particular choice of a distinguished right- 
hand side channel is delayed (i.e. to be instantiated later on). We write n(X)(z) 
for the instantiation of the (delayed) candidate with the name z. We write n : 
w = w' to denote that 7 is a candidate assignment between w and w’; and &(P) 
to denote the application of mapping w to P. 
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We define a sequent-indexed family of process relations, that is, a set of pairs 
of processes (P,Q), written T; A F P % Q:: T[n : ww], satisfying some 
conditions, typed under Q; r; At T, with w: 2, w : Q and n: ww’. Logical 
equivalence is defined inductively on the size of the typing contexts and then on 
the structure of the right-hand side type. We show only select cases (see [52] for 
the full definition). 


Definition 2.2 (Logical Equivalence). (Base Case) Given a type A and 
mappings w,w’,7, we define logical equivalence, noted P ~, Q :: 2:A[n: ww"), 
as the smallest symmetric binary relation containing all pairs of processes (P, Q) 
such that (i) - F @(P) :: z:@(A); (ii) - F @'(Q) : z:0'(A); and (iii) satisfies the 
conditions given below: 


- Par Q: 2:X[n: wesw] iff (P,Q) € n(X)(z) 
= PaL Q: zA » Bh: ws w] ifYP', y. (P ŽEL Py = 301.022 Q' s.t. YRi, Be. Ri ar 
Ro: y:A[n: w &w'| (vy) (P' | R1) = (vy)(Q’| R2) =: z:B[n : w w"] 


- Pm Q: AQ Bj]: ws] if yP’ y (P MO. p) = ggg Y Q's. 
SP, P2, Q1, Q2. P! = Pi | PoRAQ = Q1 | Q2 A Pi &™ Qi u yAn : w & w] A Po Rr 
Qo: z:B[in:w & w'] 

- P œ Q: 2zVX.A[n : w $ w'] iff YB1, B2, P', R : —:Bı & Bo. (P 
IQ'Q E? Q', P! m Q! = #Aln[X + R] wX = Bi] $ w'[X > Bal] 


ZEN, P’) implies 


(Inductive Case). Let T, A be non empty. Given 2R; r; AF P:: T and Q; T; AF 
Q :: T, the binary relation on processes T; Ab P ~ Q :: T[n: w & w] (with 
w,w : R and ņ:w <= w’) is inductively defined as: 


T; Ay: AF PQ :Th:w & w] iff VRi, R2. s.t. Ri x Ro: yAfn : ws w], 
T Ab (vy)(O(P) | (Ra) &t (29)(0"(Q) aR) E Tiwa] 

T, u: A; AF Par. Q: Tin: ww’ iff VRi, Ro. s.t. Ri & Ro: y:A[n: we’), 

T; AF (vu)(@(P) |!u(y).@(R1)) &1 (vu)(@'(Q) |!u(y).@'(Ra)) = Tin): w w'] 


For the sake of readability we often omit the 7 : w = w’ portion of ~, which 
is henceforth implicitly universally quantified. Thus, we write 2; r; A F P x, 
Q :: 2:A (or P =, Q) iff the two given processes are logically equivalent for all 
consistent instantiations of its type variables. 

It is instructive to inspect the clause for type input (VX.A): the two processes 
must be able to match inputs of any pair of related types (i.e. types related by 
a candidate), such that the continuations are related at the open type A with 
the appropriate type variable instantiations, following Girard [16]. The power of 
this style of logical relation arises from a combination of the extensional flavour 
of the equivalence and the fact that polymorphic equivalences do not require the 
same type to be instantiated in both processes, but rather that the types are 
related (via a suitable equivalence candidate relation). 
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Theorem 2.3 (Properties of Logical Equivalence [7]) 


Parametricity: If Q; T; A P :: z:A then, for allw,w':Q andn:w eu", 
we have T; AF (P) x w (P) : zA: w & w. 

Soundness: If Q; r; AF P ~ Q:: 2:A then C|P] S C[Q] :: z:4, for any closing 
c|-]. 

Completeness: If Q; rT; AF PSQ :: 2:A then QT; AF P&, Q: zA. 


3 To Linear-F and Back 


We now develop our mutually inverse and fully abstract encodings between Polyz 
and a linear polymorphic A-calculus [55] that we dub Linear-F. We first introduce 
the syntax and typing of the linear A-calculus and then proceed to detail our 
encodings and their properties (we omit typing ascriptions from the existential 
polymorphism constructs for readability). 


Definition 3.1 (Linear-F). The syntax of terms M,N and types A,B of 
Linear-F is given below. 


M,N ::=)2:A.M|MN|(M@N) | letz y= MinN |!M | let!u= Min N | AX.M 
| M[A] | pack A with M | let (X, y) = MinN | letl = MinN | ()| T|F 
A,B :=A~oB|A@B|!A|VX.A|AX.A|X|1| 2 


The syntax of types is that of the multiplicative and exponential fragments of 
second-order intuitionistic linear logic: Ax:A.M denotes linear -abstractions; 
M N denotes the application; (M & N} denotes the multiplicative pairing of M 
and N, as reflected in its elimination form let x & y = M in N which simultane- 
ously deconstructs the pair M, binding its first and second projection to x and 
y in N, respectively; !W denotes a term M that does not use any linear vari- 
ables and so may be used an arbitrary number of times; let!w = M in N binds 
the underlying exponential term of M as u in N; AX.M is the type abstraction 
former; M[A] stands for type application; pack A with M is the existential type 
introduction form, where M is a term where the existentially typed variable 
is instantiated with A; let (X,y) = Min N unpacks an existential package M, 
binding the representation type to X and the underlying term to y in N; the 
multiplicative unit 1 has as introduction form the nullary pair () and is elimi- 
nated by the construct let 1 = Min N, where M is a term of type 1. Booleans 
(type 2 with values T and F) are the basic observable. 

The typing judgment in Linear-F is given as 2; r; AF M : A, following the 
DILL formulation of linear logic [3], stating that term M has type A in a lin- 
ear context A (i.e. bindings for linear variables x:B), intuitionistic context T 
(i.e. binding for intuitionistic variables u:B) and type variable context 2. The 
typing rules are standard [7]. The operational semantics of the calculus are the 
expected call-by-name semantics with commuting conversions [27]. We write 4} 
for the evaluation relation. We write = for the largest typed congruence that is 
consistent with the observables of type 2 (i.e. a so-called Morris-style equivalence 
as in [5]). 
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3.1 Encoding Linear-F into Session 7-Calculus 


We define a translation from Linear-F to Polym generalising the one from 
[49], accounting for polymorphism and multiplicative pairs. We translate typ- 
ing derivations of \-terms to those of m-calculus terms (we omit the full typing 
derivation for the sake of readability). 

Proof theoretically, the -calculus corresponds to a proof term assignment 
for natural deduction presentations of logic, whereas the session 7-calculus from 
§ 2 corresponds to a proof term assignment for sequent calculus. Thus, we obtain 
a translation from A-calculus to the session z-calculus by considering the proof 
theoretic content of the constructive proof of soundness of the sequent calcu- 
lus wrt natural deduction. Following Gentzen [14], the translation from natural 
deduction to sequent calculus maps introduction rules to the corresponding right 
rules and elimination rules to a combination of the corresponding left rule, cut 
and/or identity. 

Since typing in the session calculus identifies a distinguished channel along 
which a process offers a session, the translation of \-terms is parameterised by a 
“result” channel along which the behaviour of the A-term is implemented. Given 
a A-term M, the process |M]; encodes the behaviour of M along the session 
channel z. We enforce that the type 2 of booleans and its two constructors are 
consistently translated to their polymorphic Church encodings before applying 
the translation to Polyz. Thus, type 2 is first translated to VX.!X—0!X—oX, 
the value T to AX.Au:X.Au:!X.let!2 = winlet!y = ving and the value F to 
AX.AW!X Av X let !a2 = winlet!y = viny. Such representations of the booleans 
are adequate up to parametricity [6] and suitable for our purposes of relating 
the session calculus (which has no primitive notion of value or result type) with 
the A-calculus precisely due to the tight correspondence between the two calculi. 


Definition 3.2 (From Linear-F to Polyz). [2]; [1]; [A] F IM] : 2:4 
denotes the translation of contexts, types and terms from Linear-F to the poly- 
morphic session calculus. The translations on contexts and types are the identity 
function. Booleans and their values are first translated to their Church encodings 
as specified above. The translation on \-terms is given below: 


[z]- 2 [ao 2] [M N]z = (vx)([M]x | (vy)x(y)-(IN]y | [x > 2) 
lulz 4 (vx)ulx).[x = z] [let !'u = Min N]- 4 (vx)([M]e | [N]-{2/u}) 
[Av:A.M]z = z(x).[M]. [(M 9 N)]z = (vy)zly).([M]y | LN] 2) 
[M] 4 lz(x). [M] [letz @y = MinN]z £ (vw)([M]y | y(z)- LN] 2) 
[AX.M]- = 2(X).[M]z [M[A]]- = (vx)([M]z | 2(A).[@ > 2]) 
[pack A with M], £ 2(A).[M]- [let (X, y) = Min N]z ê (vx)((M]y | y(X).[N]z) 
[0l =0 [let 1 = Min N]- = (v)([M]« | [N]z) 


To translate a (linear) A-abstraction Ax:A.M, which corresponds to the proof 
term for the introduction rule for —, we map it to the corresponding —oR rule, 
thus obtaining a process z(x).[M]- that inputs along the result channel z a 
channel x which will be used in [M]; to access the function argument. To encode 
the application M N, we compose (i.e. cut) [M]., where x is a fresh name, with 
a process that provides the (encoded) function argument by outputting along x 
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a channel y which offers the behaviour of [N],. After the output is performed, 
the type of x is now that of the function’s codomain and thus we conclude by 
forwarding (i.e. the id rule) between «x and the result channel z. 

The encoding for polymorphism follows a similar pattern: To encode the 
abstraction AX.M, we receive along the result channel a type that is bound 
to X and proceed inductively. To encode type application M[A] we encode the 
abstraction M in parallel with a process that sends A to it, and forwards accord- 
ingly. Finally, the encoding of the existential package pack A with M maps to an 
output of the type A followed by the behaviour [M], with the encoding of the 
elimination form let (X, y) = Min N composing the translation of the term of 
existential type M with a process performing the appropriate type input and 
proceeding as [N]-. 


Example 3.3 (Encoding of Linear-F). Consider the following A-term corre- 
sponding to a polymorphic pairing function (recall that we write Z(w).P for 
(vw)z(w).P): 

M © AX.AY.Ax:X.\y:Y.(x Q y) and N £ ((M{[A][B] M1) M2) 


Then we have, with % = £t1£2£3£4: 


IN]: = (v)([M]z; | x1(4)-[x1 > z2] | 22(B).[v2 > z3] | 
z3 (2). ([Mı]e | [£3  xa]) | Za(y).([Mally | [z4 > 2))) 
= (vč) (x1 (X).x1 (Y ).x1(x).x1 (y) z1(w).([lx = w] | [y > x1]) | z1{4).[z1 = z2] | 
z2(B).[x2 > x3] | Z3(x).([ Mile | [ws > wa) | Fa(y).([ Maly | [x4 > 2])) 
We can observe that N —>* (((Av:4.Ay:B.(x @ y)) Mı) M2) >* (Mı 8 M2). At 
the process level, each reduction corresponding to the redex of type application 
is simulated by two reductions, obtaining: 


[N]: >* (ves, v4) (z3 (x).x3(y).T3(w).([x = w] | [y > 23) | 
Ta (x). [Mi] | [x3 => x4]) | za(y)-([M2]y | [va > 2])) = P 
The reductions corresponding to the (-redexes clarify the way in which the 


encoding represents substitution of terms for variables via fine-grained name 
passing. Consider [(M, 8 M2)], £ =(w).([Mi]w | [Me]z) and 


P* (va,y)([Mile | [Maly | 2(w)-([2 > w] | ly  2))) 


The encoding of the pairing of Mı and Mə outputs a fresh name w which will 
denote the behaviour of (the encoding of) Mı, and then the behaviour of the 
encoding of Mə is offered on z. The reduct of P outputs a fresh name w which 
is then identified with x and thus denotes the behaviour of | M:]w. The channel 
z is identified with y and thus denotes the behaviour of [M2]-, making the two 
processes listed above equivalent. This informal reasoning exposes the insights 
that justify the operational correspondence of the encoding. Proof-theoretically, 
these equivalences simply map to commuting conversions which push the pro- 
cesses [MM]. and [M2]; under the output on z. 


Theorem 3.4 (Operational Correspondence) 


- If 2;0;AtM:A and M—N then |M]: => P such that [N]; =. P 
- If |M]: — P then M >* N and [N]; = P 
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3.2 Encoding Session z-calculus to Linear-F 


Just as the proof theoretic content of the soundness of sequent calculus wrt natu- 
ral deduction induces a translation from A-terms to session-typed processes, the 
completeness of the sequent calculus wrt natural deduction induces a translation 
from the session calculus to the A-calculus. This mapping identifies sequent cal- 
culus right rules with the introduction rules of natural deduction and left rules 
with elimination rules combined with (type-preserving) substitution. Crucially, 
the mapping is defined on typing derivations, enabling us to consistently identify 
when a process uses a session (i.e. left rules) or, dually, when a process offers a 
session (i.e. right rules). 


(oR) A,xz:Al P::2:B (2 1) A, x:A H (P)a,2:Arz:B : B 
At z(zx).P :: z:A — B| £ A F àxz:A.(P)A,s:4Hz:B : A — B 
(—L) 
AF Psy:A Ag,e:BFQ:: 2:C ig 
Ai, Ao, 2:A — Bt (vyjely) (P IQ z0} 


SUBST) 

(— E) 

x:A — BF z:A — B Aye (Playa: B 
41, x£:A — B F z (P) arya : B 

Ai, A2,2:A — B F (Q)az,0:Br2:0{(2 (P) Airy:4)/£} : C 


A2, £:B F (Q) Ao,2:Brz:0:C 


Fig. 3. Translation on typing derivations (excerpt — see [52]) 


Definition 3.5 (From Polyz to Linear-F). We write (2); (I); (A) | (P) : A 
for the translation from typing derivations in Polyz to derivations in Linear-F. 
The translations on types and contexts are the identity function. The translation 
on processes is given below, where the leftmost column indicates the typing rule 
at the root of the derivation (see Fig.3 for an excerpt of the translation on 
typing derivations, where we write (P)a.r,arz.4 to denote the translation of 
Q; T; AF P :: z:A. We omit 2 and I when unchanged). 


(1R) (0) =() —L) (vyjely) (PI Q)) = (Q){(x (P))/x} 
(id) ([x => yl) ĉr —R) (z(x).P) £ Ax:A.(P) 

(1L) (P) Â let1 = xin (P) (8R) ((vx)z{x).(P|Q)) = (P) 2 (Q)) 

(IR) (!z(x).P) £ (P) QL) (ax(y).P) 4 letz Q y = xin (P) 
(L) (P{u/x}) £ let!u = xin (P) (copy) ((vx)u(x).P) £ (P){u/x} 

(VR) (z(X).P) 4 AX.(P) VL)  (a(B).P) 4 (P){(2[B])/x} 
(AR) (z(B).P) £ pack B with (P) (AL) (a(¥).P) £ let (Y, x) = zin (P) 
(cut) ((vx)(P | Q)) = (QH{4UP)/z}  (cut') (vu) (lule).P | Q)) = (Q){(P)/u} 
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For instance, the encoding of a process z(x).P :: z:A — B, typed by rule —R, 
results in the corresponding — I introduction rule in the A-calculus and thus is 
A«:A.(P). To encode the process (vy)x(y).(P | Q), typed by rule —L, we make 
use of substitution: Given that the sub-process Q is typed as Q; r; A’,2:B H 
Q :: z:C, the encoding of the full process is given by (Q){(x (P))/z}. The term 
x (|P) consists of the application of x (of function type) to the argument (P), 
thus ensuring that the term resulting from the substitution is of the appropriate 
type. We note that, for instance, the encoding of rule ®L does not need to 
appeal to substitution — the A-calculus let style rules can be mapped directly. 
Similarly, rule VR is mapped to type abstraction, whereas rule VL which types 
a process of the form «(B).P maps to a substitution of the type application 
x|B] for x in (P). The encoding of existential polymorphism is simpler due to 
the let-style elimination. We also highlight the encoding of the cut rule which 
embodies parallel composition of two processes sharing a linear name, which 
clarifies the use/offer duality of the intuitionistic calculus — the process that 
offers P is encoded and substituted into the encoded user Q. 


Theorem 3.6. If Q;r; AF P :: 2:A then (Q); (I); (A) F (P): A. 


Example 3.7 (Encoding of Polym). Consider the following processes 


P £ 2(X).2(Y).2(2).2(y)2(w).([a > wll fy 2) QÊ 2(1).2(1).z(0).2(y).2(w).(w > r] 


with H P :: z:YX.VY.X — Y — X @Y and zVXVY.X = Y ~X@YFQ:: rl. 
Then: (P) = AX.AY.An:X.Ay:Y. (x Q y) (Q) = letz ® y = z[1][1] () Q inlet 1 = yin x 
((vz)(P | Q)) = let z @ y = (A4AX.AY.Ax:X.Ay:Y. (x ® y))[1][1] () O inlet 1 = ying 


By the behaviour of (vz)(P | Q), which consists of a sequence of cuts, and its 
encoding, we have that ((vz)(P | Q)) >* () and (vz)(P | Q) —>* 0 = (0). 


In general, the translation of Definition 3.5 can introduce some distance 
between the immediate operational behaviour of a process and its correspond- 
ing A-term, insofar as the translations of cuts (and left rules to non let-form 
elimination rules) make use of substitutions that can take place deep within the 
resulting term. Consider the process at the root of the following typing judg- 
ment A1, Ao, A3 H (vx) (a(y).Pi | (vy)a(y).(P2 | w(z).0)) :: w:1 — 1, derivable 
through a cut on session x between instances of -oR and —oL, where the continu- 
ation process w(z).0 offers a session w:1 — 1 (and so must use rule 1L on x). We 
have that: (vx)(x(y).P, | (vy)a(y).(P2 | w(z).0)) — (vx,y)(P; | Po | w(z).0). 
However, the translation of the process above results in the term Az:1.let1 = 
((Ay:A.(Pi)) (P2)) in let 1 = z in (), where the redex that corresponds to the pro- 
cess reduction is present but hidden under the binder for z (corresponding to 
the input along w). Thus, to establish operational completeness we consider full 
G-reduction, denoted by —g, i.e. enabling -reductions under binders. 


Theorem 3.8 (Operational Completeness). Let Q; r; AF P:: 2:A. If P —> 
Q then (P) =>% (Q). 
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In order to study the soundness direction it is instructive to consider typed 
process x:1 — LF X(y).(vz)(z(w).0 | Z(w).0) :: v:1 and its translation: 


(2(y)-(vz)(2(w).0 | Z(w).0)) = ((vz)(z(w).0 | 2(w)-0)) {(a ())/a} 
= let 1 = (Aw:1.let 1 = win ()) () inlet1 = x () in () 


The process above cannot reduce due to the output prefix on x, which cannot 
synchronise with a corresponding input action since there is no provider for x 
(i.e. the channel is in the left-hand side context). However, its encoding can 
exhibit the -redex corresponding to the synchronisation along z, hidden by the 
prefix on x. The corresponding reductions hidden under prefixes in the encoding 
can be soundly exposed in the session calculus by appealing to the commuting 
conversions of linear logic (e.g. in the process above, the instance of rule —oL 
corresponding to the output on x can be commuted with the cut on z). 

As shown in [36], commuting conversions are sound wrt observational equiva- 
lence, and thus we formulate operational soundness through a notion of extended 
process reduction, which extends process reduction with the reductions that are 
induced by commuting conversions. Such a relation was also used for similar 
purposes in [5] and in [26], in a classical linear logic setting. For conciseness, we 
define extended reduction as a relation on typed processes modulo =. 


Definition 3.9 (Extended Reduction [5]). We define + as the type preserv- 
ing relations on typed processes modulo = generated by: 


1. ly ay P] | a(y).Q > C[(vy)(P | Q)]; 
2. C (y)-P] |!z(y)-Q = C[(vy)(P | Q)] |!z(y)-Q; and 
3. PG): Q)= 0 


where C is a (typed) process context which does not capture the bound name y. 


Theorem 3.10 (Operational Soundness). Let Q;r;A F P :: z:A and 
(P) — M, there exists Q such that P =* Q and (Q) =a M. 


3.3 Inversion and Full Abstraction 


Having established the operational preciseness of the encodings to-and-from 
Polyz and Linear-F, we establish our main results for the encodings. Specifically, 
we show that the encodings are mutually inverse up-to behavioural equivalence 
(with fullness as its corollary), which then enables us to establish full abstraction 
for both encodings. 


Theorem 3.11 (Inverse). If Q;r; Ab M:A then Q; r; A ([M].) = M 
A. Also, if Qr; AF P :: 2:A then 2R; T; At [(P)]e et Po: zA. 


Corollary 3.12 (Fullness). Let Q; r; AF P : 2:A. 4M s.t. Qr; AFM:A 
and Q; T; A F [M]: 5 P :: 2:A Also, let Q; r; AF M: A. JP st. QT; AF 
P: z:A and Q; r; AF (P) SM:A. 
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We now state our full abstraction results. Given two Linear-F terms of the 
same type, equivalence in the image of the [—]. translation can be used as a proof 
technique for contextual equivalence in Linear-F. This is called the soundness 
direction of full abstraction in the literature [18] and proved by showing the 
relation generated by [M],z ~ [N] forms =; we then establish the completeness 
direction by contradiction, using fullness. 


Theorem 3.13 (Full Abstraction). Q;7;A4' M2N:A iff QT; AF 
[M] =. [N] :: 2:4. 


We can straightforwardly combine the above full abstraction with Theo- 
rem 3.11 to obtain full abstraction of the (—) translation. 


Theorem 3.14 (Full Abstraction). Q; r; A F Px, Q:: zA iff Q; r; Ab 
(P) = (Q) : A. 


4 Applications of the Encodings 


In this section we develop applications of the encodings of the previous sections. 
Taking advantage of full abstraction and mutual inversion, we apply non-trivial 
properties from the theory of the -calculus to our session-typed process setting. 

In § 4.1 we study inductive and coinductive sessions, arising through encod- 
ings of initial F-algebras and final F-coalgebras in the polymorphic A-calculus. 

In § 4.2 we study encodings for an extension of the core session calculus 
with term passing, where terms are derived from a simply-typed -calculus. 
Using the development of § 4.2 as a stepping stone, we generalise the encodings 
to a higher-order session calculus (§ 4.3), where processes can send, receive and 
execute other processes. We show full abstraction and mutual inversion theorems 
for the encodings from higher-order to first-order. As a consequence, we can 
straightforwardly derive a strong normalisation property for the higher-order 
process-passing calculus. 


4.1 Inductive and Coinductive Session Types 


The study of polymorphism in the \-calculus [1,6,19,40] has shown that para- 
metric polymorphism is expressive enough to encode both inductive and coinduc- 
tive types in a precise way, through a faithful representation of initial and final 
(co)algebras [28], without extending the language of terms nor the semantics of 
the calculus, giving a logical justification to the Church encodings of inductive 
datatypes such as lists and natural numbers. The polymorphic session calculus 
can express fairly intricate communication behaviours, including generic proto- 
cols through both existential and universal polymorphism (i.e. protocols that are 
parametric in their sub-protocols). Using our fully abstract encodings between 
the two calculi, we show that session polymorphism is expressive enough to 
encode inductive and coinductive sessions, “importing” the results for the A- 
calculus, which may then be instantiated to provide a session-typed formulation 
of the encodings of data structures in the a-calculus of [30]. 
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Inductive and Coinductive Types in System F. Exploring an algebraic 
interpretation of polymorphism where types are interpreted as functors, it can be 
shown that given a type F with a free variable X that occurs only positively (i.e. 
occurrences of X are on the left-hand side of an even number of function arrows), 
the polymorphic type VX.((F(X) — X) — X) forms an initial F-algebra [1,42] 
(we write F(X) to denote that X occurs in F). This enables the representation of 
inductively defined structures using an algebraic or categorical justification. For 
instance, the natural numbers can be seen as the initial F-algebra of F(X) = 1+ 
X (where 1 is the unit type and + is the coproduct), and are thus already present 
in System F, in a precise sense, as the type VX.((1+ X) > X) > X (noting 
that both 1 and + can also be encoded in System F). A similar story can be 
told for coinductively defined structures, which correspond to final F-coalgebras 
and are representable with the polymorphic type 4X.(X — F(X)) x X, where 
x is a product type. In the remainder of this section we assume the positivity 
requirement on F mentioned above. 

While the complete formal development of the representation of inductive 
and coinductive types in System F would lead us to far astray, we summarise 
here the key concepts as they apply to the A-calculus (the interested reader can 
refer to [19] for the full categorical details). 


F(T) F (fold A](f)) F(A) A unfold[ A] (f) T; 
in f f out 
T. fold[ A] (f) A F(A) F (unfold[A](£f)) F(T;) 

(a) (b) 


Fig. 4. Diagrams for initial F-algebras and final F-coalgebras 


To show that the polymorphic type T; = VX.((F(X) —> X) > X) is an 
initial F-algebra, one exhibits a pair of A-terms, often dubbed fold and in, such 
that the diagram in Fig.4(a) commutes (for any A, where F(f), where f is a 
A-term, denotes the functorial action of F applied to f), and, crucially, that fold 
is unique. When these conditions hold, we are justified in saying that T; is a least 
fixed point of F. Through a fairly simple calculation, it is easy to see that: 


fold = AX.A2: F(X) > X.At:T;.t[X](z) 
in £ dw: F(T;).AX.Ay:F(X) > X.y (F(fold_X](«))(x)) 


satisfy the necessary equalities. To show uniqueness one appeals to parametricity, 
which allows us to prove that any function of the appropriate type is equivalent 
to fold. This property is often dubbed initiality or universality. 

The construction of final F-coalgebras and their justification as greatest fixed 
points is dual. Assuming products in the calculus and taking Ty; = 3X.(X > 
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F(X)) x X, we produce the A-terms 


unfold = AX.Af:X > F(X).Ax:T;.pack X with (f, x£) 
out = At: Ty.let (X, (f,x)) = tin F(unfold[X](f)) (f(x)) 


such that the diagram in Fig. 4(b) commutes and unfold is unique (again, up to 
parametricity). While the argument above applies to System F, a similar devel- 
opment can be made in Linear-F [6] by considering T; = VX.!(F(X) — X) — X 
and Tp = 4X.!(X — F(X)) ® X. Reusing the same names for the sake of con- 
ciseness, the associated linear \-terms are: 


fold ê AX.Nu:l(F(X) — X).Ay:Ty.(y_X] u) : YX.(F(X) — X) — T; — X 
in £ Ax:F(T;).AX.Ay:!(F(X) — X).let!u = yin k (F (fold[X](!u))(x)) : F(T;) — T; 
unfold = AX.Au:!(X — F(X)).Ax:X.pack X with (u Q x) : YX. (X — F(X)) = X —=Tf 
out = At: Ty. let (X, (u, £)) = tin let !f = win F(unfold[X](!f)) (f(x)) : Tp — F (Tp) 


Inductive and Coinductive Sessions for Free. As a consequence of full 
abstraction we may appeal to the [—], encoding to derive representations of fold 
and unfold that satisfy the necessary algebraic properties. The derived processes 
are (recall that we write %(y).P for (vy)a(y).P): 


[fold]. = 2(X).z(u).2(y).(vw)((va)([y = z] | 2(X).[2 > wl) | W(v).((u = v] | [w > z])) 
[unfold]. = z(X).z(u).z(x).2(X).Zy).([u = y] | [z = 2]) 


We can then show universality of the two constructions. We write Py y to 
single out that x and y are free in P and P, w to denote the result of employing 
capture-avoiding substitution on P, substituting x and y by z and w. Let: 


foldP(A)y;,y2 = (vz) (ffold]x | x(A).2(v).(U(y).[y > v] | B(z).([z > ya] | [e = y2]))) 
unfoldP(A)y,,y2 = (vx)([unfold]}« | x(A).@(v).(a(y).[y > v] | &(z).([z > y1] | le > ye]))) 


where foldP(A),,,.. corresponds to the application of fold to an F-algebra A 
with the associated morphism F(A) —o A available on the shared channel u, 
consuming an ambient session y;:T; and offering y2:A. Similarly, unfoldP(A),, y2 
corresponds to the application of unfold to an F’-coalgebra A with the associated 
morphism A — F(A) available on the shared channel u, consuming an ambient 
session yı:A and offering yo:T’. 


Theorem 4.1 (Universality of foldP). VQ such that X;u:F(X) — X; yı:T; F 
Q :: yo:X we have X;u:F(X) — X; yT; F Q ~% foldP(X)y, yo = y2: X 


Theorem 4.2 (Universality of unfoldP). VQ and F-coalgebra A s.t -;-;yy:Ab 
Q :: yo:Ty we have that -;u:F(A) — A; y1:A F Q ~ unfoldP(A),, ya 2 youTy. 


Example 4.3 (Natural Numbers). We show how to represent the natural numbers 
as an inductive session type using F(X) = 1@ X, making use of in: 


zero, = (vz)(z.inl;0 | [in(z)]z) succy,« = (vs)(s.inr; [y = s] | fin(s)]x) 
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with Nat = VX.!((1 @ X) — X) — X where | zero, :: a:Nat and y:Nat + 
succ,,, :: «:Nat encode the representation of 0 and successor, respectively. The 
natural 1 would thus be represented by one, = (vy)(zeroy | SUCCy =). The 
behaviour of type Nat can be seen as a that of a sequence of internal choices of 
arbitrary (but finite) length. We can then observe that the foldP process acts as 
a recursor. For instance consider: 


stepDec, = d(n).n.case(zerog, |n > d]) decz,- = (vu)(!u(d).stepDecg | foldP(Nat)x,z) 


with stepDec, :: d:(1 @ Nat) — Nat and z:Nat | dec, :: z:Nat, where dec 
decrements a given natural number session on channel x. We have that: 


(vx)(oneg | decz,2) = (vx, y.u)(zeroy | succy,.!u(d).stepDecy | foldP(Nat)z,-) % zeroz 


We note that the resulting encoding is reminiscent of the encoding of lists of 
[30] (where zero is the empty list and succ the cons cell). The main differences 
in the encodings arise due to our primitive notions of labels and forwarding, as 
well as due to the generic nature of in and fold. 


Example 4.4 (Streams). We build on Example 4.3 by representing streams of 
natural numbers as a coinductive session type. We encode infinite streams of 
naturals with F(X) = Nat @ X. Thus: NatStream = 4X.!(X — (Nat X))@X. 
The behaviour of a session of type NatStream amounts to an infinite sequence of 
outputs of channels of type Nat. Such an encoding enables us to construct the 
stream of all naturals nats (and the stream of all non-zero naturals oneNats): 


genHdNext, £ z(n).Z(y).(m(n’).[n! = y] | !z(w).70(n’) .succn’ w) 
nats, £ (vg, u) (zero, | !u(z).genHdNext, | unfoldP(!Nat).. y) 
oneNats, = (va,u)(one, | !u(z).genHdNext, | unfoldP(!Nat),,.,,) 


with genHdNext, :: z:!Nat —o Nat@!Nat and both nats, and oneNats :: 
y:NatStream. genHdNext, consists of a helper that generates the current head of 
a stream and the next element. As expected, the following process implements 
a session that “unrolls” the stream once, providing the head of the stream and 
then behaving as the rest of the stream (recall that out : Ty —o F(Ty)). 


(vx) (nats, | [out(x)],) :: y:Nat & NatStream 


We note a peculiarity of the interaction of linearity with the stream encoding: 
a process that begins to deconstruct a stream has no way of “bottoming out” and 
stopping. One cannot, for instance, extract the first element of a stream of nat- 
urals and stop unrolling the stream in a well-typed way. We can, however, easily 
encode a “terminating” stream of all natural numbers via F(X) = (Nat@!X) by 
replacing the genHdNext, with the generator given as: 


genHdNextTer, = z(n).z(y).(n(n’).[n’ = y] | !z(w).!w(w’).71(n') succn’ w) 


It is then easy to see that a usage of [out(zx)], results in a session of type 
Nat@!NatStream, enabling us to discard the stream as needed. One can replay 
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this argument with the operator F(X) = (!Nat & X) to enable discarding of 
stream elements. Assuming such modifications, we can then show: 


(vy)((vx) (nats, | [out(x)],,) | y(n).[y = 2]) ~r oneNats, :: z:NatStream 


4.2 Communicating Values — Sess7A 


We now consider a session calculus extended with a data layer obtained from a 
A-calculus (whose terms are ranged over by M,N and types by 7,0). We dub 
this calculus Sess7A. 


P,Q ==- |x£(M).P | a(y).P A,B :=-::|TAA|TDA 
M,N ::=Au:7.M|MN| 2 To n=: e |r>o 


Without loss of generality, we consider the data layer to be simply-typed, with 
a call-by-name semantics, satisfying the usual type safety properties. The typ- 
ing judgment for this calculus is W F M : r. We omit session polymorphism 
for the sake of conciseness, restricting processes to communication of data 
and (session) channels. The typing judgment for processes is thus modified to 
W;T; AF P :: z:A, where ¥ is an intuitionistic context that accounts for vari- 
ables in the data layer. The rules for the relevant process constructs are (all 
other rules simply propagate the ¥ context from conclusion to premises): 


WEM:7 WT; AFP: 2:A (AR) V yT; TD; Aw: Ar Qs: 2:0 
Yr; AF 2(M).P ss 2z:7A\A WT; A, x:T AAF a(y).Q :: 2:0 
r:r; ; AFP: zA (oR) WEM:7 WT; A,x: AF Qs: 2:C 

WT; Ab 2(@).P: zr 5D A Y; r; A, x:T D AF a(M).Q :: 2:C 


(AL) 


(DL) 


With the reduction rule given by:! 2(M).P | x(y).Q —> P | Q{M/y}. With a 
simple extension to our encodings we may eliminate the data layer by encoding 
the data objects as processes, showing that from an expressiveness point of 
view, data communication is orthogonal to the framework. We note that the 
data language we are considering is not linear, and the usage discipline of data 
in processes is itself also not linear. 


To First-Order Processes. We now introduce our encoding for Sess7A, defined 
inductively on session types, processes, types and A-terms (we omit the purely 
inductive cases on session types and processes for conciseness). As before, the 
encoding on processes is defined on typing derivations, where we indicate the 
typing rule at the root of the typing derivation. 


[Faa e rel Foer Pool 4!) 


(AR) [2(M).P] = 2(x).(!2(y).[M]y | IPD (^L) [2(y).P] = z()-LP] 
(DR) [z(x).P] = z(x).[P] (SL) [2(M).P] = £(y).Cy(w).[M]w | [PD 


' For simplicity, in this section, we define the process semantics through a reduction 
relation. 


A 
4 


oOo 
Se 
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[z]: =Z(y)-ly = 2] [Az:7.M], = 2(2).[M]. 

[M N]: = (vy)([M]y | 9(z).(!e(w)-[N]w | ly © 2])) 
The encoding addresses the non-linear usage of data elements in processes by 
encoding the types T ^A A and 7 D A as !|7] @ [A] and ![7] — [A], respectively. 
Thus, sending and receiving of data is codified as the sending and receiving of 
channels of type !, which therefore can be used non-linearly. Moreover, since 
data terms are themselves non-linear, the T — ø type is encoded as ![r] — [o], 
following Girard’s embedding of intuitionistic logic in linear logic [15]. 

At the level of processes, offering a session of type T A A (i.e. a process of 
the form z(M).P) is encoded according to the translation of the type: we first 
send a fresh name x which will be used to access the encoding of the term M. 
Since M can be used an arbitrary number of times by the receiver, we guard 
the encoding of M with a replicated input, proceeding with the encoding of P 
accordingly. Using a session of type T D A follows the same principle. The input 
cases (and the rest of the process constructs) are completely homomorphic. 

The encoding of A-terms follows Girard’s decomposition of the intuitionistic 
function space [49]. The \-abstraction is translated as input. Since variables in 
a A-abstraction may be used non-linearly, the case for variables and application 
is slightly more intricate: to encode the application M N we compose M in 
parallel with a process that will send the “reference” to the function argument 
N which will be encoded using replication, in order to handle the potential 
for 0 or more usages of variables in a function body. Respectively, a variable 
is encoded by performing an output to trigger the replication and forwarding 
accordingly. Without loss of generality, we assume variable names and their 
corresponding replicated counterparts match, which can be achieved through a- 
conversion before applying the translation. We exemplify our encoding as follows: 


z(x).2(x).2((Ay:o.2)).0] = z ) 
= (0) Zw). (tufu). zly).ly > u] | Zo). Co(4)-4(y).2(t).[é > i | 0) 


8 
© 
Xl 
= 
€ 
= 
£ 
=~ 
e 
= 
= 
R 
pad 
£ 
xl 
A 
e 
= 
a 
€ 
on 
= 
ey 
= 
> 
s 
S] 
R 
nae 
=] 
= 
= 


Properties of the Encoding. We discuss the correctness of our encoding. We 
can straightforwardly establish that the encoding preserves typing. 

To show that our encoding is operationally sound and complete, we capture 
the interaction between substitution on A-terms and the encoding into processes 
through logical equivalence. Consider the following reduction of a process: 


(vz)(z(a).z(x).2((Ay:a.2)).0 | z(Aw:79.w).P) 
= (vz)(z(Aw:79.w).z((Ay:a.Aw:T79.w)).0 | P) (1) 


Given that substitution in the target session z-calculus amounts to renaming, 
whereas in the A-calculus we replace a variable for a term, the relationship 
between the encoding of a substitution M{N/x} and the encodings of M and 
N corresponds to the composition of the encoding of M with that of N, but 
where the encoding of N is guarded by a replication, codifying a form of explicit 
non-linear substitution. 
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Lemma 4.5 (Compositionality). Let V,a:r M :o and YH N: 7. We have 
that [M{N/x}]z = (ve)([M]z |!x(y)-LN]y) 


Revisiting the process to the left of the arrow in Eq. 1 we have: 


[(v2)(2(2).2(2).2((Ay:o.a)).0 | z(Aw:70. 
= (vz)([2(2).2(2).2(Ay:o-x)) 0] | Z(2). 
> (vz, x) (Z(w).(!w(u).2(y).[y = u] | Z(v 


w)-P)] 
(ta().[Aw:70.w]o | [P])) 
)-Co(@).Ay:e-a]} | 0) | !x(6).[Aw:70-w]}o | LP) 


whereas the process to the right of the arrow is encoded as: 


[(vz) (z(Aw:79.w).z((Ay:o.Aw:T9.w)).0 | P)] 
= (vz)(Z(w).(!w(u).[Aw:70-w]u | 2(v) Coa). [Ay:o.Aw:79.w);: | IPD 


While the reduction of the encoded process and the encoding of the reduct 
differ syntactically, they are observationally equivalent — the latter inlines the 
replicated process behaviour that is accessible in the former on x. Having char- 
acterised substitution, we establish operational correspondence for the encoding. 


Theorem 4.6 (Operational Correspondence) 
1. fU} M:7 and |M]; > Q then M >* N such that [N]: = Q 
2. fV; r; AFP :: z:A and |P] > Q then P —* P’ such that [P'] =. Q 


3. fU- M:7r and M >N then [M]; => P such that P ~ [N]: 
4. fU; r; AFP: 2:A and P >Q then [P] >* R with R x [Q] 


The process equivalence in Theorem 4.6 above need not be extended to 
account for data (although it would be relatively simple to do so), since the 
processes in the image of the encoding are fully erased of any data elements. 


Back to \-Terms. We extend our encoding of processes to A-terms to Sess7X. 
Our extended translation maps processes to linear A-terms, with the session type 
T ^ A interpreted as a pair type where the first component is replicated. Dually, 
T D Ais interpreted as a function type where the domain type is replicated. The 
remaining session constructs are translated as in § 3.2. 


(rAAD= Mr) @(AD (FD ADAM) MAD (F oa S Nr) ~ Ce) 


(AL) (x(y).P) = lety @ x = zin let !y = yin (P) (AR) (z(M).P) 
(DR) (x(y).P) = Aw:!(7).let !2 = x in (P) (DL) (a(M).P) 


(Ax:7.M) £ Aa:!(r).let!2 = xin (M) (M N) (M) N) (2) 22 


The treatment of non-linear components of processes is identical to our pre- 
vious encoding: non-linear functions T — o are translated to linear functions of 
type !r — a; a process offering a session of type T A A (i.e. a process of the form 
z(M).P, typed by rule AR) is translated to a pair where the first component is 
the encoding of M prefixed with ! so that it may be used non-linearly, and the 
second is the encoding of P. Non-linear variables are handled at the respective 
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binding sites: a process using a session of type 7 A A is encoded using the elimi- 
nation form for the pair and the elimination form for the exponential; similarly, 
a process offering a session of type T D A is encoded as a A-abstraction where 
the bound variable is of type !(r). Thus, we use the elimination form for the 
exponential, ensuring that the typing is correct. We illustrate our encoding: 


(z(a).z(x).z((Ay:o.z)).0) = Aw: (r).let!a = xin (la @ (!(Ay:o-x) 8 ())) 
= Aax:!(7).let!a2 = vin (la @ (!(Ay:!(o).let !y = yin x) @ ())) 


Properties of the Encoding. Unsurprisingly due to the logical correspon- 
dence between natural deduction and sequent calculus presentations of logic, 
our encoding satisfies both type soundness and operational correspondence (c.f. 
Theorems 3.6, 3.8 and 3.10). The full development can be found in [52]. 


Relating the Two Encodings. We prove the two encodings are mutually 
inverse and preserve the full abstraction properties (we write =g and =g,, for 8- 
and G7-equivalence, respectively). 


Theorem 4.7 (Inverse). If Y; r; At P :: z:A then [(P)]z =. [P]. Also, if 
Wt M:7 then (LM].) = (M). 


The equivalences above are formulated between the composition of the encod- 
ings applied to P (resp. M) and the process (resp. A-term) after applying the 
translation embedding the non-linear components into their linear counterparts. 
This formulation matches more closely that of § 3.3, which applies to linear cal- 
culi for which the target languages of this section are a strict subset (and avoids 
the formalisation of process equivalence with terms). We also note that in this 
setting, observational equivalence and (7-equivalence coincide [3,31]. Moreover, 
the extensional flavour of =, includes 7-like principles at the process level. 


Theorem 4.8. Let- M:7 and- N: 7. (M) =p, (N) iff [M]: & [N] 
Also, let- Ps: z:A and- Q:: 2:A. We have that [P] ~ [Q] iff (P) =n (Q). 


We establish full abstraction for the encoding of A-terms into processes (The- 
orem 4.8) in two steps: The completeness direction (i.e. from left-to-right) follows 
from operational completeness and strong normalisation of the A-calculus. The 
soundness direction uses operational soundness. The proof of Theorem 4.8 uses 
the same strategy of Theorem 3.14, appealing to the inverse theorems. 


4.3 Higher-Order Session Processes — Sess7\+ 


We extend the value-passing framework of the previous section, accounting for 
process-passing (i.e. the higher-order) in a session-typed setting. As shown in 
[50], we achieve this by adding to the data layer a contextual monad that encap- 
sulates (open) session-typed processes as data values, with a corresponding elim- 
ination form in the process layer. We dub this calculus Sess7A*. 


P,Q ==. -|r M-77 M.N :: =- | {x — P e yi:Ai} 
T,0 = +++ | {aj:Aj F z:A} 
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The type {x;:A; + z:A} is the type of a term which encapsulates an open process 
that uses the linear channels x;:A,; and offers A along channel z. This formulation 
has the added benefit of formalising the integration of session-typed processes 
in a functional language and forms the basis for the concurrent programming 
language SILL [37,50]. The typing rules for the new constructs are (for simplicity 
we assume no shared channels in process monads): 


Wisc: Ai;- Pi: 2A 
WE {z— P e 2;:A;} : {xi Ai F z:A} 


UHM :{riA4iF a:A} A Ss yidi W;T;Ae,27:AF Q: 2:0 F 
V: T; A, AF r Mem: 2:C $; 


{H 


Rule {}J embeds processes in the term language by essentially quoting an 
open process that is well-typed according to the type specification in the monadic 
type. Dually, rule {}ÆE allows for processes to use monadic values through com- 
position that consumes some of the ambient channels in order to provide the 
monadic term with the necessary context (according to its type). These con- 
structs are discussed in substantial detail in [50]. The reduction semantics of the 
process construct is given by (we tacitly assume that the names y and c do not 
occur in P and omit the congruence case): 


(c = {z = P = zidi} — T: Q) > (ve)(P{g/Ti{c/2}} | Q) 


The semantics allows for the underlying monadic term M to evaluate to a 
(quoted) process P. The process P is then executed in parallel with the contin- 
uation Q, sharing the linear channel c for subsequent interactions. We illustrate 
the higher-order extension with following typed process (we write {x — P} when 
P does not depend on any linear channels and assume F Q :: d:Nat A 1): 


P > (ve)(c({d — Q})-c(x).0 | e(y).d — y; d(n).c(n).0) (2) 


Process P above gives an abstract view of a communication idiom where a 
process (the left-hand side of the parallel composition) sends another process 
Q which potentially encapsulates some complex computation. The receiver then 
spawns the execution of the received process and inputs from it a result value 
that is sent back to the original sender. An execution of P is given by: 


P > (vo)(e(2).0 | d= {d = Q}; d(n)-c(n).0) > (ve)(el2).0 | (v4)(Q | d(n)-c(n).0)) 
—>* (ve)(e(x).0 | c(42).0) — 0 
Given the seminal work of Sangiorgi [46], such a representation naturally begs 
the question of whether or not we can develop a typed encoding of higher-order 
processes into the first-order setting. Indeed, we can achieve such an encoding 
with a fairly simple extension of the encoding of § 4.2 to SesstAt by observing 
that monadic values are processes that need to be potentially provided with 
extra sessions in order to be executed correctly. For instance, a term of type 
{z:A F y:B} denotes a process that given a session x of type A will then offer 
y:B. Exploiting this observation we encode this type as the session A — B, 
ensuring subsequent usages of such a term are consistent with this interpretation. 
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UAH A} 4 TA] A] 
[fe — P > m} ê 2(yo)..-..2(yn)LP{z/a}] (2 Z fn(P)) 
[ce — M — 7; Q] ê (ve)([M]z | F(a0).([a0 > yo] |> | e(an).(lan = yn] | IQD ---)) 


To encode the monadic type {x;:A; F z:A}, denoting the type of process P 
that is typed by 2;:A; F P :: z:A, we require that the session in the image of the 
translation specifies a sequence of channel inputs with behaviours Aj that make 
up the linear context. After the contextual aspects of the type are encoded, the 
session will then offer the (encoded) behaviour of A. Thus, the encoding of the 
monadic type is [Ao] — ... —> [An] — [A], which we write as [Aj] — [A]. 
The encoding of monadic expressions adheres to this behaviour, first performing 
the necessary sequence of inputs and then proceeding inductively. Finally, the 
encoding of the elimination form for monadic expressions behaves dually, com- 
posing the encoding of the monadic expression with a sequence of outputs that 
instantiate the consumed names accordingly (via forwarding). The encoding of 
process P from Eq. 2 is thus: 


[P] = (ve)(Le({d — Q}).c(x).0] | [e(y).d — y; d(n).c(n).0]) 
= (vc) (e(w).(!w(d).[Q] | c(a).0)e(y).(vd)(y(0).[b > d] | d(n).e(m).(n(e).[e = m] | 0))) 


Properties of the Encoding. As in our previous development, we can show 
that our encoding for Sess7A* is type sound and satisfies operational correspon- 
dence. The full development is omitted but can be found in [52]. 

We encode Sess7A* into \-terms, extending § 4.2 with: 


({ai:Ai H 2:A}) = (Ai) — (A) 
(z — M — FQ) = (QUMI Hi)/x} He P — Wi}) = Awo....-Awn-(P) 


The encoding translates the monadic type {x;:A; + z:A} as a linear function 
(A;) — (A), which captures the fact that the underlying value must be pro- 
vided with terms satisfying the requirements of the linear context. At the level 
of terms, the encoding for the monadic term constructor follows its type specifi- 
cation, generating a nesting of \-abstractions that closes the term and proceed- 
ing inductively. For the process encoding, we translate the monadic application 
construct analogously to the translation of a linear cut, but applying the appro- 
priate variables to the translated monadic term (which is of function type). We 
remark the similarity between our encoding and that of the previous section, 
where monadic terms are translated to a sequence of inputs (here a nesting 
of A-abstractions). Our encoding satisfies type soundness and operational corre- 
spondence, as usual. Further showcasing the applications of our development, we 
obtain a novel strong normalisation result for this higher-order session-calculus 
“for free”, through encoding to the A-calculus. 


Theorem 4.9 (Strong Normalisation). Let Y; I; At P :: z:A. There is no 
infinite reduction sequence starting from P. 


Theorem 4.10 (Inverse Encodings). If Y; r; Alt P: z:A then [(P)]z = 
[P]. Also, if YF M: 7 then ([M]-) =s (M). 
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Theorem 4.11. Let’ M:7,/ N:7,/ P: 2A and Q =: z:A. (M) =s (N) 
iff [M]z ~ [N]: and [P] = [Q] if (P) =n (Q). 


5 Related Work and Concluding Remarks 


Process Encodings of Functions. Toninho et al. [49] study encodings of the 
simply-typed A-calculus in a logically motivated session 7-calculus, via encodings 
to the linear A-calculus. Our work differs since they do not study polymorphism 
nor reverse encodings; and we provide deeper insights through applications of 
the encodings. Full abstraction or inverse properties are not studied. 

Sangiorgi [43] uses a fully abstract compilation from the higher-order r- 
calculus (HOr?) to the z-calculus to study full abstraction for Milner’s encodings 
of the A-calculus. The work shows that Milner’s encoding of the lazy A-calculus 
can be recovered by restricting the semantic domain of processes (the so-called 
restrictive approach) or by enriching the A-calculus with suitable constants. This 
work was later refined in [45], which does not use HOr and considers an oper- 
ational equivalence on A-terms called open applicative bisimulation which coin- 
cides with Lévy-Longo tree equality. The work [47] studies general conditions 
under which encodings of the A-calculus in the z-calculus are fully abstract wrt 
Lévy-Longo and Bohm Trees, which are then applied to several encodings of (call- 
by-name) A-calculus. The works above deal with untyped calculi, and so reverse 
encodings are unfeasible. In a broader sense, our approach takes the restrictive 
approach using linear logic-based session typing and the induced observational 
equivalence. We use a A-calculus with booleans as observables and reason with 
a Morris-style equivalence instead of tree equalities. It would be an interesting 
future work to apply the conditions in [47] in our typed setting. 

Wadler [54] shows a correspondence between a linear functional language 
with session types GV and a session-typed process calculus with polymorphism 
based on classical linear logic CP. Along the lines of this work, Lindley and 
Morris [26], in an exploration of inductive and coinductive session types through 
the addition of least and greatest fixed points to CP and GV, develop an encoding 
from a linear A-calculus with session primitives (Concurrent wGV) to a pure 
linear A-calculus (Functional uGV) via a CPS transformation. They also develop 
translations between CP and Concurrent wGV, extending [25]. Mapping to the 
terminology used in our work [17], their encodings are shown to be operationally 
complete, but no results are shown for the operational soundness directions and 
neither full abstraction nor inverse properties are studied. In addition, their 
operational characterisations do not compose across encodings. For instance, 
while strong normalisation of Functional ~GV implies the same property for 
Concurrent wGV through their operationally complete encoding, the encoding 
from CP to wGV does not necessarily preserve this property. 

Types for z-calculi delineate sequential behaviours by restricting composition 
and name usages, limiting the contexts in which processes can interact. Therefore 
typed equivalences offer a coarser semantics than untyped semantics. Berger et 
al. [5] study an encoding of System F in a polymorphic linear z-calculus, showing 
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it to be fully abstract based on game semantics techniques. Their typing system 
and proofs are more complex due to the fine-grained constraints from game 
semantics. Moreover, they do not study a reverse encoding. Orchard and Yoshida 
[33] develop embeddings to-and-from PCF with parallel effects and a session- 
typed a-calculus, but only develop operational correspondence and semantic 
soundness results, leaving the full abstraction problem open. 


Polymorphism and Typed Behavioural Semantics. The work of [7] stud- 
ies parametric session polymorphism for the intuitionistic setting, developing a 
behavioural equivalence that captures parametricity, which is used (denoted as 
%ı) in our paper. The work [39] introduces a typed bisimilarity for polymor- 
phism in the z-calculus. Their bisimilarity is of an intensional flavour, whereas 
the one used in our work follows the extensional style of Reynolds [41]. Their 
typing discipline (originally from [53], which also develops type-preserving encod- 
ings of polymorphic -calculus into polymorphic z-calculus) differs significantly 
from the linear logic-based session typing of our work (e.g. theirs does not ensure 
deadlock-freedom). A key observation in their work is the coarser nature of typed 
equivalences with polymorphism (in analogy to those for IO-subtyping [38]) and 
their interaction with channel aliasing, suggesting a use of typed semantics and 
encodings of the z-calculus for fine-grained analyses of program behaviour. 


F-Algebras and Linear-F. The use of initial and final (co)algebras to give a 
semantics to inductive and coinductive types dates back to Mendler [28], with 
their strong definability in System F appearing in [1,19]. The definability of 
inductive and coinductive types using parametricity also appears in [40] in the 
context of a logic for parametric polymorphism and later in [6] in a linear variant 
of such a logic. The work of [55] studies parametricity for the polymorphic linear 
A-calculus of this work, developing encodings of a few inductive types but not 
the initial (or final) algebraic encodings in their full generality. Inductive and 
coinductive session types in a logical process setting appear in [26,51]. Both 
works consider a calculus with built-in recursion — the former in an intuitionistic 
setting where a process that offers a (co)inductive protocol is composed with 
another that consumes the (co)inductive protocol and the latter in a classical 
framework where composed recursive session types are dual each other. 


Conclusion and Future Work. This work answers the question of what kind 
of type discipline of the z-calculus can exactly capture and is captured by à- 
calculus behaviours. Our answer is given by showing the first mutually inverse 
and fully abstract encodings between two calculi with polymorphism, one being 
the Polyz session calculus based on intuitionistic linear logic, and the other (a 
linear) System F. This further demonstrates that the linear logic-based articula- 
tion of name-passing interactions originally proposed by [8] (and studied exten- 
sively thereafter e.g. [7,9,25,36,50,51,54]) provides a clear and applicable tool 
for message-passing concurrency. By exploiting the proof theoretic equivalences 
between natural deduction and sequent calculus we develop mutually inverse 
and fully abstract encodings, which naturally extend to more intricate settings 
such as process passing (in the sense of HOr). Our encodings also enable us to 
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derive properties of the z-calculi “for free”. Specifically, we show how to obtain 
adequate representations of least and greatest fixed points in Polyz through the 
encoding of initial and final (co)algebras in the A-calculus. We also straightfor- 
wardly derive a strong normalisation result for the higher-order session calcu- 
lus, which otherwise involves non-trivial proof techniques [5,7,12,13,36]. Future 
work includes extensions to the classical linear logic-based framework, including 
multiparty session types [10,11]. Encodings of session 7-calculi to the A-calculus 
have been used to implement session primitives in functional languages such as 
Haskell (see a recent survey [32]), OCaml [24,34,35] and Scala [48]. Following 
this line of work, we plan to develop encoding-based implementations of this 
work as embedded DSLs. This would potentially enable an exploration of alge- 
braic constructs beyond initial and final co-algebras in a session programming 
setting. In particular, we wish to further study the meaning of functors, natural 
transformations and related constructions in a session-typed setting, both from 
a more fundamental viewpoint but also in terms of programming patterns. 


Acknowledgements. The authors thank Viviana Bono, Dominic Orchard and the 
reviewers for their comments, suggestions and pointers to related works. This work 
is partially supported by EPSRC EP/K034413/1, EP/K011715/1, EP/L00058X/1, 
EP/N027833/1, EP/N028201/1 and NOVA LINCS (UID/CEC/04516/2013). 


References 


1. Bainbridge, E.S., Freyd, P.J., Scedrov, A., Scott, P.J.: Functorial polymorphism. 
Theor. Comput. Sci. 70(1), 35-64 (1990) 

2. Balzer, S., Pfenning, F.: Manifest sharing with session types. In: ICFP (2017) 

3. Barber, A.: Dual intuitionistic linear logic. Technical report ECS-LFCS-96-347. 
School of Informatics, University of Edinburgh (1996) 

4. Benton, P.N.: A mixed linear and non-linear logic: proofs, terms and models. In: 
Pacholski, L., Tiuryn, J. (eds.) CSL 1994. LNCS, vol. 933, pp. 121-135. Springer, 
Heidelberg (1995). https://doi.org/10.1007/BFb0022251 

5. Berger, M., Honda, K., Yoshida, N.: Genericity and the z-calculus. Acta Inf. 42(2- 
3), 83-141 (2005) 

6. Birkedal, L., Mogelberg, R.E., Petersen, R.L.: Linear abadi and plotkin logic. Log. 
Methods Comput. Sci. 2(5), 1-48 (2006) 

7. Caires, L., Pérez, J.A., Pfenning, F., Toninho, B.: Behavioral polymorphism and 
parametricity in session-based communication. In: Felleisen, M., Gardner, P. (eds.) 
ESOP 2013. LNCS, vol. 7792, pp. 330-349. Springer, Heidelberg (2013). https:// 
doi.org/10.1007/978-3-642-37036-6_19 

8. Caires, L., Pfenning, F.: Session types as intuitionistic linear propositions. In: 
Gastin, P., Laroussinie, F. (eds.) CONCUR, 2010. LNCS, vol. 6269, pp. 222-236. 
Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-15375-4_16 

9. Caires, L., Pfenning, F., Toninho, B.: Linear logic propositions as session types. 
Math. Struct. Comput. Sci. 26(3), 367—423 (2016) 

10. Carbone, M., Lindley, S., Montesi, F., Schuermann, C., Wadler, P.: Coherence 
generalises duality: a logical explanation of multiparty session types. In: CONCUR 
2016, vol. 59, pp. 33:1-33:15. Sch. Dag. (2016) 


11. 


12. 


13. 
14. 
15. 
16. 
17. 
18. 
19. 


20. 


21. 


22. 


23. 


24. 


25. 


26. 


27. 


28. 


29. 


30. 


31. 


On Polymorphic Sessions and Functions 853 


Carbone, M., Montesi, F., Schurmann, C., Yoshida, N.: Multiparty session types 
as coherence proofs. In: CONCUR 2015, vol. 42, pp. 412-426. Sch. Dag. (2015) 
Demangeon, R., Hirschkoff, D., Sangiorgi, D.: Mobile processes and termination. 
In: Palsberg, J. (ed.) Semantics and Algebraic Specification. LNCS, vol. 5700, pp. 
250-273. Springer, Heidelberg (2009). https: //doi.org/10.1007/978-3-642-04164- 
8-13 

Demangeon, R., Hirschkoff, D., Sangiorgi, D.: Termination in higher-order concur- 
rent calculi. J. Log. Algebr. Program. 79(7), 550-577 (2010) 

Gentzen, G.: Untersuchungen über das logische schließen. Math. Z. 39, 176-210 
(1935) 

Girard, J.: Linear logic. Theor. Comput. Sci. 50, 1-102 (1987) 

Girard, J., Lafont, Y., Taylor, P.: Proofs and Types. CUP, Cambridge (1989) 
Gorla, D.: Towards a unified approach to encodability and separation results for 
process calculi. Inf. Comput. 208(9), 1031-1053 (2010) 

Gorla, D., Nestmann, U.: Full abstraction for expressiveness: history, myths and 
facts. Math. Struct. Comput. Sci. 26(4), 639-654 (2016) 

Hasegawa, R.: Categorical data types in parametric polymorphism. Math. Struct. 
Comput. Sci. 4(1), 71-109 (1994) 

Honda, K.: Types for dyadic interaction. In: Best, E. (ed.) CONCUR 1993. LNCS, 
vol. 715, pp. 509-523. Springer, Heidelberg (1993). https://doi.org/10.1007/3-540- 
57208-2_35 

Honda, K.: Session types and distributed computing. In: Czumaj, A., Mehlhorn, 
K., Pitts, A., Wattenhofer, R. (eds.) ICALP 2012. LNCS, vol. 7392, pp. 23-23. 
Springer, Heidelberg (2012). https: //doi.org/10.1007/978-3-642-31585-5_4 
Honda, K., Vasconcelos, V.T., Kubo, M.: Language primitives and type discipline 
for structured communication-based programming. In: Hankin, C. (ed.) ESOP 
1998. LNCS, vol. 1381, pp. 122-138. Springer, Heidelberg (1998). https: //doi.org/ 
10.1007 /BFb0053567 

Honda, K., Yoshida, N., Carbone, M.: Multiparty asynchronous session types. In: 
POPL 2008, pp. 273-284 (2008) 

Imai, K., Yoshida, N., Yuen, S.: Session-ocaml1: a session-based library with polar- 
ities and lenses. In: Jacquet, J.-M., Massink, M. (eds.) COORDINATION 2017. 
LNCS, vol. 10319, pp. 99-118. Springer, Cham (2017). https://doi.org/10.1007/ 
978-3-319-59746-1_6 

Lindley, S., Morris, J.G.: A semantics for propositions as sessions. In: Vitek, J. (ed.) 
ESOP 2015. LNCS, vol. 9032, pp. 560-584. Springer, Heidelberg (2015). https:// 
doi.org/10.1007/978-3-662-46669-8_23 

Lindley, S., Morris, J.G.: Talking bananas: structural recursion for session types. 
In: ICFP 2016, pp. 434-447 (2016) 

Maraist, J., Odersky, M., Turner, D.N., Wadler, P.: Call-by-name, call-by-value, 
call-by-need and the linear lambda calculus. T. C. S. 228(1-2), 175-210 (1999) 
Mendler, N.P.: Recursive types and type constraints in second-order lambda cal- 
culus. In: LICS 1987, pp. 30-36 (1987) 

Milner, R.: Functions as processes. In: Paterson, M.S. (ed.) ICALP 1990. LNCS, 
vol. 443, pp. 167-180. Springer, Heidelberg (1990). https://doi.org/10.1007/ 
BFb0032030 

Milner, R., Parrow, J., Walker, D.: A calculus of mobile processes I and II. Inf. 
Comput. 100(1), 1-77 (1992) 

Ohta, Y., Hasegawa, M.: A terminating and confluent linear lambda calculus. In: 
Pfenning, F. (ed.) RTA 2006. LNCS, vol. 4098, pp. 166-180. Springer, Heidelberg 
(2006). https: //doi.org/10.1007/11805618_13 


854 


32. 


33. 
34. 
35. 


36. 


37. 


38. 


39. 


40. 


Al. 


42. 


43. 


44. 


45. 


46. 


47. 


48. 


49. 


50. 


B. Toninho and N. Yoshida 


Orchard, D., Yoshida, N.: Session types with linearity in Haskell. In: Gay, S., 
Ravara, A. (eds.) Behavioural Types: From Theory to Tools. River Publishers, 
Gistrup (2017) 

Orchard, D.A., Yoshida, N.: Effects as sessions, sessions as effects. In: POPL 2016, 
pp. 568-581 (2016) 

Padovani, L.: A Simple Library Implementation of Binary Sessions. JFP 27 (2016) 
Padovani, L.: Context-free session type inference. In: Yang, H. (ed.) ESOP 2017. 
LNCS, vol. 10201, pp. 804-830. Springer, Heidelberg (2017). https://doi.org/10. 
1007/978-3-662-54434-1_30 

Pérez, J.A., Caires, L., Pfenning, F., Toninho, B.: Linear logical relations for 
session-based concurrency. In: Seidl, H. (ed.) ESOP 2012. LNCS, vol. 7211, pp. 
539-558. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-28869- 
2.27 

Pfenning, F., Griffith, D.: Polarized substructural session types. In: Pitts, A. (ed.) 
FoSSaCS 2015. LNCS, vol. 9034, pp. 3-22. Springer, Heidelberg (2015). https:// 
doi.org/10.1007/978-3-662-46678-0_1 

Pierce, B.C., Sangiorgi, D.: Typing and subtyping for mobile processes. Math. 
Struct. Comput. Sci. 6(5), 409-453 (1996) 

Pierce, B.C., Sangiorgi, D.: Behavioral equivalence in the polymorphic pi-calculus. 
J. ACM 47(3), 531-584 (2000) 

Plotkin, G., Abadi, M.: A logic for parametric polymorphism. In: Bezem, M., 
Groote, J.F. (eds.) TLCA 1993. LNCS, vol. 664, pp. 361-375. Springer, Heidelberg 
(1993). https: //doi.org/10.1007/BFb0037118 

Reynolds, J.C.: Types, abstraction and parametric polymorphism. In: IFIP 
Congress, pp. 513-523 (1983) 

Reynolds, J.C., Plotkin, G.D.: On functors expressible in the polymorphic typed 
lambda calculus. Inf. Comput. 105(1), 1-29 (1993) 

Sangiorgi, D.: An investigation into functions as processes. In: Brookes, S., Main, 
M., Melton, A., Mislove, M., Schmidt, D. (eds.) MFPS 1993. LNCS, vol. 802, pp. 
143-159. Springer, Heidelberg (1994). https://doi.org/10.1007/3-540-58027-1_7 
Sangiorgi, D.: II-calculus, internal mobility, and agent-passing calculi. Theor. Com- 
put. Sci. 167(1&2), 235-274 (1996) 

Sangiorgi, D.: Lazy functions and mobile processes. In: Proof, Language, and Inter- 
action: Essays in Honour of Robin Milner, pp. 691-720 (2000) 

Sangiorgi, D., Walker, D.: The z-Calculus: A Theory of Mobile Processes. Cam- 
bridge University Press, Cambridge (2001) 

Sangiorgi, D., Xu, X.: Trees from functions as processes. In: Baldan, P., Gorla, D. 
(eds.) CONCUR 2014. LNCS, vol. 8704, pp. 78-92. Springer, Heidelberg (2014). 
https: //doi.org/10.1007/978-3-662-44584-6_7 

Scalas, A., Dardha, O., Hu, R., Yoshida, N.: A linear decomposition of multiparty 
sessions for safe distributed programming. In: ECOOP 2017 (2017) 

Toninho, B., Caires, L., Pfenning, F.: Functions as session-typed processes. 
In: Birkedal, L. (ed.) FoSSaCS 2012. LNCS, vol. 7213, pp. 346-360. Springer, 
Heidelberg (2012). https://doi.org/10.1007/978-3-642-28729-9_23 

Toninho, B., Caires, L., Pfenning, F.: Higher-order processes, functions, and ses- 
sions: a monadic integration. In: Felleisen, M., Gardner, P. (eds.) ESOP 2013. 
LNCS, vol. 7792, pp. 350-369. Springer, Heidelberg (2013). https://doi.org/10. 
1007/978-3-642-37036-6_20 


51. 


52. 


53. 


54. 
55. 


On Polymorphic Sessions and Functions 855 


Toninho, B., Caires, L., Pfenning, F.: Corecursion and non-divergence in session- 
typed processes. In: Maffei, M., Tuosto, E. (eds.) TGC 2014. LNCS, vol. 8902, pp. 
159-175. Springer, Heidelberg (2014). https://doi.org/10.1007/978-3-662-45917- 
11i 

Toninho, B., Yoshida, N.: On polymorphic sessions and functions: a tale of two 
(fully abstract) encodings (long version). CoRR abs/1711.00878 (2017) 

Turner, D.: The polymorphic pi-calculus: Theory and implementation. Technical 
report ECS-LFCS-96-345. School of Informatics, University of Edinburgh (1996) 
Wadler, P.: Propositions as sessions. J. Funct. Program. 24(2-3), 384-418 (2014) 
Zhao, J., Zhang, Q., Zdancewic, S.: Relational parametricity for a polymorphic 
linear lambda calculus. In: Ueda, K. (ed.) APLAS 2010. LNCS, vol. 6461, pp. 344— 
359. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-17164-2_24 


Open Access This chapter is licensed under the terms of the Creative Commons 
Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), 
which permits use, sharing, adaptation, distribution and reproduction in any medium 
or format, as long as you give appropriate credit to the original author(s) and the 
source, provide a link to the Creative Commons license and indicate if changes were 
made. 


The images or other third party material in this chapter are included in the chapter’s 


Creative Commons license, unless indicated otherwise in a credit line to the material. If 
material is not included in the chapter’s Creative Commons license and your intended 
use is not permitted by statutory regulation or exceeds the permitted use, you will 
need to obtain permission directly from the copyright holder. 


®) 


Check for 
updates 


Concurrent Kleene Algebra: Free Model 
and Completeness 


Tobias Kappé“), Paul Brunet, Alexandra Silva, and Fabio Zanasi 


University College London, London, UK 
tkappe@cs.ucl.ac.uk 


Abstract. Concurrent Kleene Algebra (CKA) was introduced by Hoare, 
Moeller, Struth and Wehrman in 2009 as a framework to reason about 
concurrent programs. We prove that the axioms for CKA with bounded 
parallelism are complete for the semantics proposed in the original paper; 
consequently, these semantics are the free model for this fragment. This 
result settles a conjecture of Hoare and collaborators. Moreover, the tech- 
nique developed to this end allows us to establish a Kleene Theorem for 
CKA, extending an earlier Kleene Theorem for a fragment of CKA. 


1 Introduction 


Concurrent Kleene Algebra (CKA) [8] is a mathematical formalism which extends 
Kleene Algebra (KA) with a parallel composition operator, in order to express 
concurrent program behaviour.! In spite of such a seemingly simple addition, 
extending the existing KA toolkit (notably, completeness) to the setting of CKA 
turned out to be a challenging task. A lot of research happened since the original 
paper, both foundational [13,20] and on how CKA could be used to reason about 
important verification tasks in concurrent systems [9,11]. However, and despite 
several conjectures [9,13], the question of the characterisation of the free CKA 
and the completeness of the axioms remained open, making it impractical to use 
CKA in verification tasks. This paper settles these two open questions. We answer 
positively the conjecture that the free model of CKA is formed by series parallel 
pomset languages, downward-closed under Gischer’s subsumption order [6]—a 
generalisation of regular languages to sets of partially ordered words. To this 
end, we prove that the original axioms proposed in [8] are indeed complete. 
Our proof of completeness is based on extending an existing complete- 
ness result that establishes series-parallel rational pomset languages as the free 
Bi-Kleene Algebra (BKA) [20]. The extension to the existing result for BKA pro- 
vides a clear understanding of the difficulties introduced by the presence of the 
exchange axiom and shows how to separate concerns between CKA and BKA, a 
technique which is also useful elsewhere. For one, our construction also provides 


1 In its original formulation, CKA also features an operator (parallel star) for 
unbounded parallelism: in harmony with several recent works [13,14], we study the 
variant of CKA without parallel star, sometimes called “weak” CKA. 
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an extension of (half of) Kleene’s theorem for BKA [14] to CKA, establishing 
pomset automata as an operational model for CKA and opening the door to 
decidability procedures similar to those previously studied for KA. Furthermore, 
it reduces deciding the equational theory of CKA to deciding the equational 
theory of BKA. 

BKA is defined as CKA with the only (but significant) omission of the 
exchange law, (e || f)- (g || R) Sexa (€- g) || (F - hR). The exchange law is the 
core element of CKA as it softens true concurrency: it states that when two 
sequentially composed programs (i.e., e- g and f - h) are composed in parallel, 
they can be implemented by running their heads in parallel, followed by running 
their tails in parallel (i.e., e || f, then g || h). The exchange law allows the imple- 
menter of a CKA expression to interleave threads at will, without violating the 
specification. 

To illustrate the use of the exchange law, consider a protocol with three 
actions: query a channel c, collect an answer from the same channel, and print 
an unrelated message m on screen. The specification for this protocol requires 
the query to happen before reception of the message, but the printing action 
being independent, it may be executed concurrently. We will write this specifica- 
tion as (q(c)- r(c)) || p(m), with the operator - denoting sequential composition. 
However, if one wants to implement this protocol in a sequential programming 
language, a total ordering of these events has to be introduced. Suppose we 
choose to implement this protocol by printing m while we wait to receive an 
answer. This implementation can be written q(c) - p(m) - r(c). Using the laws 
of CKA, we can prove that g(c) - p(m) - r(e) Sea (qlc) - r(c)) || p(m), which we 
interpret as the fact that this implementation respects the specification. Intu- 
itively, this means that the specification lists the necessary dependencies, but 
the implementation can introduce more. 

Having a complete axiomatisation of CKA has two main benefits. First, it 
allows one to get certificates of correctness. Indeed, if one wants to use CKA for 
program verification, the decision procedure presented in [3] may be used to test 
program equivalence. If the test gives a negative answer, this algorithm provides 
a counter-example. However if the answer is positive, no meaningful witness 
is produced. With the completeness result presented here, that is constructive 
in nature, one could generate an axiomatic proof of equivalence in these cases. 
Second, it gives one a simple way of checking when the aforementioned procedure 
applies. By construction, we know that two terms are semantically equivalent 
whenever they are equal in every concurrent Kleene algebra, that is any model of 
the axioms of CKA. This means that if we consider a specific semantic domain, 
one simply needs to check that the axioms of CKA hold in there to know that 
the decision procedure of [3] is sound in this model. 

While this paper was in writing, a manuscript with the same result 
appeared [19]. Among other things, the proof presented here is different in that it 
explicitly shows how to syntactically construct terms that express certain pom- 
set languages, as opposed to showing that such terms must exist by reasoning 
on a semantic level. We refer to Sect. 5 for a more extensive comparison. 
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The remainder of this paper is organised as follows. In Sect.2, we give an 
informal overview of the completeness proof. In Sect. 3, we introduce the nec- 
essary concepts, notation and lemmas. In Sect. 4, we work out the proof. We 
discuss the result in a broader perspective and outline further work in Sect. 5. 


2 Overview of the Completeness Proof 


We start with an overview of the steps necessary to arrive at the main result. As 
mentioned, our strategy in tackling CKA-completeness is to build on the existing 
BKA-completeness result. Following an observation by Laurence and Struth, we 
identify downward-closure (under Gischer’s subsumption order [6]) as the feature 
that distinguishes the pomsets giving semantics to BKA-expressions from those 
associated with CKA-expressions. In a slogan, 


CKA-semantics = BKA-semantics + downward-closure. 


This situation is depicted in the upper part of the commuting diagram in Fig. 1. 
Intuitively, downward-closure can be thought of as the semantic outcome of 
adding the exchange axiom, which distinguishes CKA from BKA. Thus, if a and 
b are events that can happen in parallel according to the BKA-semantics of a 
term, then a and b may also be ordered in the CKA-semantics of that same term. 


Ilska series-parallel 
terms > 
pomset languages 
A J 
syntactic Cg semantic 
closure closure 
X - downward-closed 
downward-closed [-exa 


>  series-parallel 
terms 
pomset languages 


Fig. 1. The connection between BKA and CKA semantics mediated by closure. 


The core of our CKA-completeness proof will be to construct a syntactic 
counterpart to the semantic closure. Concretely, we shall build a function that 
maps a CKA term e to an equivalent term e|, called the (syntactic) closure of e. 
The lower part of the commuting diagram in Fig. 1 shows the property that e| 
must satisfy in order to deserve the name of closure: its BKA semantics has to 
be the same as the CKA semantics of e. 
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Example 2.1. Consider e = a || b, whose CKA-semantics prescribe that a and 
b are events that may happen in parallel. One closure of this term would be 
e] =a || b+a-b+b-a, whose BKA-semantics stipulate that either a and b execute 
purely in parallel, or a precedes b, or b precedes a—thus matching the optional 
parallelism of a and b. For a more non-trivial example, take e = a* || b*, which 
represents that finitely many repetitions of a and b occur, possibly in parallel. 
A closure of this term would be e| = (a* || b*)*: finitely many repetitions of a 
and b occur truly in parallel, which is repeated indefinitely. 


In order to find e| systematically, we are going to construct it in stages, 
through a completely syntactic procedure where each transformation has to be 
valid according to the axioms. There are three main stages. 


(i) We note that, not unexpectedly, the hardest case for computing the closure 
of a term is when e is a parallel composition, i.e., when e = eo || eı for 
some CKA terms eg and e1. For the other operators, the closure of the 
result can be obtained by applying the same operator to the closures of its 
arguments. For instance, (e + f) | =el + fl. This means that we can focus 
on calculating the closure for the particular case of parallel composition. 

(ii) We construct a preclosure of such terms e, whose BKA semantics contains 
all but possibly the sequentially composed pomsets of the CKA semantics 
of e. Since every sequentially composed pomset decomposes (uniquely) into 
non-sequential pomsets, we can use the preclosure as a basis for induction. 

(iii) We extend this preclosure of e to a proper closure, by leveraging the fixpoint 
axioms of KA to solve a system of linear inequations. This system encodes 
“stringing together” non-sequential pomsets to build all pomsets in e. 


As a straightforward consequence of the closure construction, we obtain a 
completeness theorem for CKA, which establishes the set of closed series-rational 
pomset languages as the free CKA. 


3 Preliminaries 


We fix a finite set of symbols X, the alphabet. We use the symbols a, b and c to 
denote elements of X. The two-element set {0,1} is denoted by 2. Given a set 
S, the set of subsets (powerset) of S is denoted by 25. 

In the interest of readability, the proofs for technical lemmas in this section 
can be found in the full version [15]. 


3.1 Pomsets 


A trace of a sequential program can be modelled as a word, where each letter 
represents an atomic event, and the order of the letters in the word represents 
the order in which the events took place. Analogously, a trace of a concurrent 
program can be thought of as word where letters are partially ordered, i.e., there 
need not be a causal link between events. In literature, such a partially ordered 
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word is commonly called a partial word [7], or partially ordered multiset (pomset, 
for short) [6]; we use the latter term. 

A formal definition of pomsets requires some work, because the partial order 
should order occurrences of events rather than the events themselves. For this 
reason, we first define a labelled poset. 


Definition 3.1. A labelled poset is a tuple (S,<, A), where (S, <) is a partially 
ordered set (i.e., S is a set and < is a partial order on S), in which S is called 
the carrier and < is the order; A: S — X is a function called the labelling. 


We denote labelled posets with lower-case bold symbols u, v, et cetera. Given 
a labelled poset u, we write Su for its carrier, <u for its order and A, for its 
labelling. We write 1 for the empty labelled poset. We say that two labelled 
posets are disjoint if their carriers are disjoint. 

Disjoint labelled posets can be composed parallelly and sequentially; parallel 
composition simply juxtaposes the events, while sequential composition imposes 
an ordering between occurrences of events originating from the left operand and 
those originating from the right operand. 


Definition 3.2. Let u and v be disjoint. We write u || v for the parallel com- 
position ofu and v, which is the labelled poset with the carrier Suuv = Su U Sv, 
the order Suv = Su U <v and the labeling Aujy defined by 


Aul) £E Sy; 
Aull (2) = ae 7 : S. 


Similarly, we write u - v for the sequential composition of u and v, that is, 
labelled poset with the carrier Syuy and the partial order 


Kuiv = Su U Xy U (Su x Sy), 
as well as the labelling Auv = Aulv- 


Note that 1 is neutral for sequential and parallel composition, in the sense that 
we have 1 || u=1-u=u=u-l1=u||1. 

There is a natural ordering between labelled posets with regard to concur- 
rency. 


Definition 3.3. Let u,v be labelled posets. A subsumption from u to v is a 
bijection h : Sy > Sy that preserves order and labels, i.e., u <u u’ implies that 
h(u) <y h(u’), and Ayoh = Au. We simplify and write h : u — v for a subsump- 
tion from u to v. If such a subsumption exists, we write v Lu. Furthermore, h 
is an isomorphism if both h and its inverse h~! are subsumptions. If there exists 
an isomorphism from u to v we write u = v. 


Intuitively, if u E v, then u and v both order the same set of (occurrences 
of) events, but u has more causal links, or “is more sequential” than v. One 
easily sees that E is a preorder on labelled posets of finite carrier. 

Since the actual contents of the carrier of a labelled poset do not matter, we 
can abstract from them using isomorphism. This gives rise to pomsets. 
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Definition 3.4. A pomset is an isomorphism class of labelled posets, i.e., the 
class |v] = {u : u =v} for some labelled poset v. Composition lifts to pomsets: 
we write [ul || [v] for [u || v] and [u] - [v] for [u - v]. Similarly, subsumption also 
lifts to pomsets: we write [u] E [v], precisely when u E v. 


We denote pomsets with upper-case symbols U, V, et cetera. The empty 
pomset, i.e., [1] = {1}, is denoted by 1; this pomset is neutral for sequential 
and parallel composition. To ensure that [v] is a set, we limit the discussion to 
labelled posets whose carrier is a subset of some set S. The labelled posets in 
this paper have finite carrier; it thus suffices to choose S = N to represent all 
pomsets with finite (or even countably infinite) carrier. 

Composition of pomsets is well-defined: if u and v are not disjoint, we can 
find u’, v’ disjoint from u, v respectively such that u S u’ and v & v’. The choice 
of representative does not matter, for if u S u’ and v =v’, then u-v Su- v’. 
Subsumption of pomsets is also well-defined: if u’ S u E v S v’, then u’ Cv’. 
One easily sees that E is a partial order on finite pomsets, and that sequential 
and parallel composition are monotone with respect to C, i.e., if U CE W and 
VE X,thn U-VEW.X andU || V CW || X. Lastly, we note that both 
types of composition are associative, both on the level of pomsets and labelled 
posets; we therefore omit parentheses when no ambiguity is likely. 


Series-Parallel Pomsets. If a € X, we can construct a labelled poset with a 
single element labelled by a; indeed, since any labelled poset thus constructed 
is isomorphic, we also use a to denote this isomorphism class; such a pomset is 
called a primitive pomset. A pomset built from primitive pomsets and sequential 
and parallel composition is called series-parallel; more formally: 


Definition 3.5. The set of series-parallel pomsets, denoted SP(X), is the small- 
est set such that 1 € SP(X) as well as a € SP(X) for every a € X, and is closed 
under parallel and sequential composition. 


We elide the sequential composition operator when we explicitly construct a 
pomset from primitive pomsets, i.e., we write ab instead of a -b for the pomset 
obtained by sequentially composing the (primitive) pomsets a and b. In this 
notation, sequential composition takes precedence over parallel composition. 

All pomsets encountered in this paper are series-parallel. A useful feature of 
series-parallel pomsets is that we can deconstruct them in a standard fashion [6]. 


Lemma 3.1. Let U € SP(2’). Then exactly one of the following is true: either 
(i) U = 1, or (ii) U = a for some a € X, or (iti) U = Uo : Uı for Uo, U: € 
SP(X) \ {1}, or (iv) U = Uo || U1 for Uo, Uy E SP( X) \ {1}. 

In the sequel, it will be useful to refer to pomsets that are not of the third 
kind above, i.e., cannot be written as Up - U; for Uo, U1 € SP(X) \ {1}, as non- 
sequential pomsets. Lemma 3.1 gives a normal form for series-parallel pomsets, 
as follows. 


Corollary 3.1. A pomset U € SP(X) can be uniquely decomposed as U = Uo - 
Ui ---Un—1, where for all O < i < n, U; is series parallel and non-sequential. 
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Factorisation. We now go over some lemmas on pomsets that will allow us 
to factorise pomsets later on. First of all, one easily shows that subsumption is 
irrelevant on empty and primitive pomsets, as witnessed by the following lemma. 


Lemma 3.2. Let U and V be pomsets such that U E V or V CU. If U is 
empty or primitive, then U = V. 


We can also consider how pomset composition and subsumption relate. It is 
not hard to see that if a pomset is subsumed by a sequentially composed pomset, 
then this sequential composition also appears in the subsumed pomset. A similar 
statement holds for pomsets that subsume a parallel composition. 


Lemma 3.3 (Factorisation). Let U, Vo, and Vı be pomsets such that U is 
subsumed by Vo: Vi. Then there exist pomsets Uo and U; such that: 


U =U9-U;, Uo E Vo, and Ui E Vi. 


Also, if Uo, U1 and V are pomsets such that Uo || U1 E V, then there exist 
pomsets Vo and Vı such that: 


V = V || Vi, Uo E Vo, and U1 E Vj. 


The next lemma can be thought of as a generalisation of Levi’s lemma [21], 
a well-known statement about words, to pomsets. It says that if a sequential 
composition is subsumed by another (possibly longer) sequential composition, 
then there must be a pomset “in the middle”, describing the overlap between 
the two; this pomset gives rise to a factorisation. 


Lemma 3.4. Let U and V be pomsets, and let Wo, W1, ..., Wn—1 with n > 0 be 
non-empty pomsets such that U -V E Wo: Wi- -- Wn—1. There exists an m < n 
and pomsets Y, Z such that: 


Y-ZOW,, UE Wo: Wi- Wm-1:' Y, and V E Z. Wry41 > Wm+2': Wn. 
Moreover, if U and V are series-parallel, then so are Y and Z. 


Levi’s lemma also has an analogue for parallel composition. 


Lemma 3.5. Let U,V,W,X be pomsets such that U || V =W || X. There exist 
pomsets Yo, Yı, Zo, Zı such that 


U= || Yi, V = Zo | Z, W= || Zo, and X = Yı | Z\. 


The final lemma is useful when we have a sequentially composed pomset 
subsumed by a parallelly composed pomset. It tells us that we can factor the 
involved pomsets to find subsumptions between smaller pomsets. This lemma 
first appeared in [6], where it is called the interpolation lemma. 
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Lemma 3.6 (Interpolation). Let U,V,W,X be pomsets such that U-V is 
subsumed by W || X. Then there exist pomsets Wo, W1, Xo, X1 such that 


Wo: Wy, C W, Xo: Xı C X, UC Wo | Xo, and V E Wi | Xj. 
Moreover, if W and X are series-parallel, then so are Wo, W1, Xo and Xa. 


On a semi-formal level, the interpolation lemma can be understood as follows. 
IfU-V EW || X, then the events in W are partitioned between those that end 
up in U, and those that end up in V; these give rise to the “sub-pomsets” Wo 
and W; of W, respectively. Similarly, X partitions into “sub-pomsets” Xo and 
Xı. We refer to Fig. 2 for a graphical depiction of this situation. 

Now, if y precedes z in Wo || Xo, then y must precede z in W || X, and 
therefore also in U - V. Since y and z are both events in U, it then follows that 
y precedes z in U, establishing that U E Wọ || Xo. Furthermore, if y precedes z 
in W, then we can exclude the case where y is in W, and z in Wọ, for then z 
precedes y in U-V, contradicting that y precedes z in U- V. Accordingly, either 
y and z both belong to Wo or W1, or y is in Wo while z is in Wj; in all of these 
cases, y must precede z in Wo - W1. The other subsumptions hold analogously. 


U V 
Xo | Xi Xo X1 | X 
Wo 2 Wi Wo Wi W 


Fig. 2. Splitting pomsets in the interpolation lemma 


Pomset Languages. The semantics of BKA and CKA are given in terms of sets 
of series-parallel pomsets. 


Definition 3.6. A subset of SP(X) is referred to as a pomset language. 


As a convention, we denote pomset languages by the symbols U, V, et cetera. 
Sequential and parallel composition of pomsets extends to pomset languages in 
a pointwise manner, 1.e., 

u-vs{U-V:UEUu,VeEvV} 
and similarly for parallel composition. Like languages of words, pomset languages 
have a Kleene star operator, which is similarly defined, i.e., U* £ Unen U”, where 
the nt” power of U is inductively defined as U? £ {1} and U”+! £4” U. 

A pomset language U is closed under subsumption (or simply closed) if when- 
ever U € U with U’ E U and U’ € SP(X), it holds that U’ € U. The closure 


under subsumption (or simply closure) of a pomset language U, denoted U|, is 
defined as the smallest pomset language that contains U and is closed, i.e., 


U, £ {U' € SP(X) : JU € U. U' E U} 


Closure relates to union, sequential composition and iteration as follows. 
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Lemma 3.7. Let U,V be pomset languages; then: 
(UUV) =ULUVI,U-V)L =U,- V], and Uu*| =U\*. 


Proof. The first claim holds for infinite unions, too, and follows immediately 
from the definition of closure. 

For the second claim, suppose that U € U and V € V, and that W CU -V. 
By Lemma 3.3, we find pomsets Wo and W, such that W = Wo- Wy, with 
Wo CU and W, CV. It then holds that Wo € U| and W; E V|, meaning that 
W = Wọ: Wı EUL- V]. This shows that (U -V)} EUI- V}. Proving the reverse 
inclusion is a simple matter of unfolding the definitions. 

For the third claim, we can calculate directly using the first and second parts 
of this lemma: 


ul =(Uuuu)= U(uu.u)= LJ ul -ul.--ul =ul* 


n€N  ntimes n n times EN n times 


3.2 Concurrent Kleene Algebra 


We now consider two extensions of Kleene Algebra (KA), known as Bi-Kleene 
Algebra (BKA) and Concurrent Kleene Algebra (CKA). Both extend KA with an 
operator for parallel composition and thus share a common syntax. 


Definition 3.7. The set T is the smallest set generated by the grammar 
efs=O0|1llaexXx letfleflelfle 


The BKA-semantics of a term is a straightforward inductive application of 
the operators on the level of pomset languages. The CKA-semantics of a term 
is the BKA-semantics, downward-closed under the subsumption order; the CKA- 
semantics thus includes all possible sequentialisations. 


Definition 3.8. The function [—]x, : T > 25? is defined as follows: 


OT ea 2 e+ Fleka 2 elska U Dlia le” Jora = Jelka 
Tlia = {1 } le ` Ilika = lelara ` Lilir 
Alexa = {a} e || flaca = lelara I Eleka 


Finally, [—Jex, : T — 25°) is defined as fele, = [elexat- 


Following Lodaya and Weil [22], if U is a pomset language such that U = 
[elska for some e € T, we say that the language U is series-rational. Note that 
if U is such that U = [e].,, for some term e € T, then U is closed by definition. 

To axiomatise semantic equivalence between terms, we build the following 
relations, which match the axioms proposed in [20]. The axioms of CKA as 
defined in [8] come from a double quantale structure mediated by the exchange 
law; these imply the ones given here. The converse implication does not hold; 
in particular, our syntax does not include an infinitary greatest lower bound 
operator. However, BKA (as defined in this paper) does have a finitary greatest 
lower bound [20], and by the existence of closure, so does CKA. 


> 


> 
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Definition 3.9. The relation =x, is the smallest congruence on T (with respect 
to all operators) such that for alle, f,g ET: 


e+0 =se e+e=sae  e+f=sa fte e+(f+g)=sk (Poth 
e» 1 =eka € 1- e =pka € e- (fg) Sua teed) <9 
e- 0 Zeka O =e 0e ef +o)Suefek (eb f)-PSaqn eet fg 
e || f =sxa f || e e || 1 Zsa € e || (£ ll 9) =exa (e || f) Ig 
e || 0 =exa 0 ell (f +9) =exae || ft+ellg 1+ e: e* =pxa €* 
e+ fg Seng => f*-e Sena g 


in which we use e Sex, f as a shorthand for e+ f =sxa f. The final (conditional) 
axiom is referred to as the least fixpoint axiom. 

The relation = x, is the smallest congruence on T that satisfies the rules of 
=pxa, and furthermore satisfies the exchange law for alle, f,g,h ET: 


(ell £) (gll A) Saa (e 9) || (Fh) 


where we similarly use e Sexa f as a shorthand for e+ f =ca f. 


We can see that =,x, includes the familiar axioms of KA, and stipulates 
that || is commutative and associative with unit 1 and annihilator 0, as well as 
distributive over +. When using CKA to model concurrent program flow, the 
exchange law models sequentialisation: if we have two programs, the first of 
which executes e followed by g, and the second of which executes f followed by 
h, then we can sequentialise this by executing e and f in parallel, followed by 
executing g and h in parallel. 

We use the symbol T in statements that are true for T € {BKA, CKA}. The 
relation =, is sound for equivalence of terms under T [13]. 


Lemma 3.8. Lete,f ET. Ife =, f, then [e], = [f] 


Since all binary operators are associative (up to =+), we drop parentheses 
when writing terms like e+ f + g—this does not incur ambiguity with regard to 
[—].,- We furthermore consider - to have precedence over ||, which has precedence 
over +; as usual, the Kleene star has the highest precedence of all operators. For 
instance, when we write e+ f - g* || h, this should be read as e+ ((f - (g*)) || h). 

In case of BKA, the implication in Lemma 3.8 is an equivalence [20], and thus 
gives a complete axiomatisation of semantic BKA-equivalence of terms. 


Theorem 3.1. Let e,f ET. Then e =x, f if and only if [el ax, = Lflaxa- 


Given a term e € T, we can determine syntactically whether its (BKA or 
CKA) semantics contains the empty pomset, using the function defined below. 


? Strictly speaking, the proof in [20] includes the parallel star operator in BKA. Since 
this is a conservative extension of BKA, this proof applies to BKA as well. 
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Definition 3.10. The nullability function €: T — 2 is defined as follows: 


(0) £0 ele + f) £ ele) Ve(f) e(e*) 41 
e(1) 21 ele: f) Žele) Ac(f) 
e(a) =0 ee || f) = ele) Ne(f) 


in which V and A are understood as the usual lattice operations on 2. 


That e encodes the presence of 1 in the semantics is witnessed by the 
following. 


Lemma 3.9. Let e € T. Then e(e) Sı e and 1 € [e], if and only if e(e) = 1. 
In the sequel, we need the (parallel) width of a term. This is defined as follows. 


Definition 3.11. Lete € T. The (parallel) width of e, denoted by |e|, is defined 
as 0 when e =sxa 0; for all other cases, it is defined inductively, as follows: 


|1] +0 le + f| = max((el, ||) le ll fl = lel +If 
la +1 le: f| = max(lel, | fI) le*] = le] 
The width of a term is invariant with respect to equivalence of terms. 
Lemma 3.10. Lete, f ET. Ife =s f, then |e] = |f|. 
The width of a term is related to its semantics as demonstrated below. 
Lemma 3.11. Let e €T, and let U € [e],,, be such that U # 1. Then |e| > 0. 


3.3 Linear Systems 


KA is equipped to find the least solutions to linear inequations. For instance, 
if we want to find X such that e- X + f Ska X, it is not hard to show that 
e* - f is the least solution for X, in the sense that this choice of X satisfies the 
inequation, and for any choice of X that also satisfies this inequation it holds that 
e* - f Ska X. Since KA is contained in BKA and CKA, the same constructions 
also apply there. These axioms generalise to systems of linear inequations in 
a straightforward manner; indeed, Kozen [18] exploited this generalisation to 
axiomatise KA. In this paper, we use systems of linear inequations to construct 
particular expressions. To do this, we introduce vectors and matrices of terms. 
For the remainder of this section, we fix I as a finite set. 


Definition 3.12. An I-vector is a function from I to T. Addition of I-vectors 
is defined pointwise, i.e., if p and q are I-vectors, then p+ q is the I-vector 
defined for i € I by (p + q)(i) = pli) + q(t). 

An I-matrix is a function from I? to T. Left-multiplication of an I-vector 
by an I-matriz is defined in the usual fashion, i.e., if M is an I-matrix and p is 
an I-vector, then M -p is the I-vector defined fori € I by 


(M - p)(i) = XD MG, 5) - pl) 


jel 
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Equivalence between terms extends pointwise to J-vectors. More precisely, 

we write p =; q for I-vectors p and q when p(i) =; q(i) for all i € I, and p S+ q 
when p +q = q. 
Definition 3.13. An I-linear system £ is a pair (M, p) where M is an I-matrix 
and p is an I-vector. A solution to £ in T is an I-vector s such that M-s+p S; s. 
A least solution to £ in T is a solution s in T such that for any solution t in T 
it holds that s S- t. 


It is not very hard to show that least solutions of a linear system are unique, 
up to =+; we therefore speak of the least solution of a linear system. 

Interestingly, any I-linear system has a least solution, and one can con- 
struct this solution using only the operators of KA. The construction proceeds 
by induction on |J|. In the base, where J is empty, the solution is trivial; for the 
inductive step it suffices to reduce the problem to finding the least solution of a 
strictly smaller linear system. This construction is not unlike Kleene’s procedure 
to obtain a regular expression from a finite automaton [17]. Alternatively, we 
can regard the existence of least solutions as a special case of Kozen’s proof of 
the fixpoint for matrices over a KA, as seen in [18, Lemma 9]. 

As a matter of fact, because this construction uses the axioms of KA exclu- 
sively, the least solution that is constructed is the same for both BKA and CKA. 


Lemma 3.12. Let £ be an I-linear system. One can construct a single I- 
vector x that is the least solution to £ in both BKA and CKA. 


We include a full proof of the lemma above using the notation of this paper 
in the full version of this paper [15]. 


4 Completeness of CKA 


We now turn our attention to proving that =x, is complete for CKA-semantic 
equivalence of terms, i.e., that if e, f € T are such that [elea = [fleka then 
e =cxa f. In the interest of readability, proofs of technical lemmas in this section 
can be found in the full version of this paper [15]. 

As mentioned before, our proof of completeness is based on the completeness 
result for BKA reproduced in Theorem 3.1. Recall that [e].,, = [elska]. To reuse 
completeness of BKA, we construct a syntactic variant of the closure operator, 
which is formalised below. 


Definition 4.1. Lete € T. We say that e| is a closure of e if both e =x, el 
and [elJan, = [eleka] old. 


Example 4.1. Let e = a || b; as proposed in Sect. 2, we claim that e| = a || 
b+b-a+a-b is a closure of e. To see why, first note that e Seka e} by construction. 
Furthermore, 


ab =cxa (a || 1) - (1 |] b) Sexa (a + 1) || (1-b) Scxa a || b 


and similarly ba Sx, €; thus, e =cxa e|. Lastly, the pomsets in [e],,.,1 and [elec 
are simply a || b, ab and ba, and therefore [el]. = [ellexa!- 
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Laurence and Struth observed that the existence of a closure for every term 
implies a completeness theorem for CKA, as follows. 


Lemma 4.1. Suppose that we can construct a closure for every element of T. 
Ife, f ET such that le] x. = [flecay, then e Soa F. 


Proof. Since fele = [elax,l = Lells,., and similarly [f].,, = If leka we have 
lella = [fl] exa: By Theorem 3.1, we get el =ska fl, and thus e} =a, f}, since 
all axioms of BKA are also axioms of CKA. By e =cxa e| and f| =ca f, we can 
then conclude that e =cxa f. 


The remainder of this section is dedicated to showing that the premise of 
Lemma 4.1 holds. We do this by explicitly constructing a closure e| for every 
e € T. First, we note that closure can be constructed for the base terms. 


Lemma 4.2. Lete € 2 ore=a for somea € X. Then e is a closure of itself. 


Furthermore, closure can be constructed compositionally for all operators 
except parallel composition, in the following sense. 


Lemma 4.3. Suppose that e9,e1 E T, and that eo and eı have closures e9| and 
e1|. Then (i) eo} +e1) is a closure of eg +e1, (ii) eo|-e1| is a closure of eo- e1, 
and (iii) (eo})* is a closure of &. 


Proof. Since eo} =cxa €o and €1| =cka €1, by the fact that =x, is a congruence we 
obtain e9| + e1) =cxa €o + €1. Similar observations hold for the other operators. 
We conclude using Lemma 3.7. 


It remains to consider the case where e = eo || e1. In doing so, our induction 
hypothesis is that any f € T with |f| < |eo || e1| has a closure, as well as any 
strict subterm of eo || e1. 


4.1 Preclosure 


To get to a closure of a parallel composition, we first need an operator on terms 
that is not a closure quite yet, but whose BKA-semantics is “closed enough” to 
cover the non-sequential elements of the CKA-semantics of the term. 


Definition 4.2. Lete € T. A preclosure ofe is a term č E€ T such that č =cxa €. 
Moreover, if U € |e], is non-sequential, then U € [é] 


CKA BKA’ 


Example 4.2. Suppose that eo || e1 = (a || b) || c. A preclosure of eg || e1 could be 
é=al|b||c+(a-b+b-a)||c+(b-ct+c-b) ||a+(a-c+e-a) ||b 

To verify this, note that e Seka € by construction; remains to show that € Seka €. 

This is fairly straightforward: since a-b+b-a Scxa a || b, we have (a-b+b-a) || € Seka 

e; the other terms are treated similarly. Consequently, e =x, €. Furthermore, 


there are seven non-sequential pomsets in [e],,,,; they are 


a|| || c ab || c ba || c be || a cb || a ac || b ca || b 


Each of these pomsets is found in [é],,,,,. It should be noted that čis not a closure 
of e; to see this, consider for instance that abc € [e],,,, while abc € [é],,,- 
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The remainder of this section is dedicated to showing that, under the induc- 
tion hypothesis, we can construct a preclosure for any parallelly composed term. 
This is not perfectly straightforward; for instance, consider the term eg || e1 dis- 
cussed in Example 4.2. At first glance, one might be tempted to choose eo} || e1} 
as a preclosure, since eg| and e;| exist by the induction hypothesis. In that case, 
eo} =a || b+a-b+b-ais a closure of eo. Furthermore, e1} = c is a closure of e1, 
by Lemma 4.2. However, eo} || e1} is not a preclosure of eo || e1, since (a - c) || b 
is non-sequential and found in [eo || e1]eka; but not in feo! || e1L] x4. 

The problem is that the preclosure of e9 and e; should also allow (partial) 
sequentialisation of parallel parts of eg and e1; in this case, we need to sequen- 
tialise the a part of a || 6 with c, and leave b untouched. To do so, we need 
to be able to split eo || e1 into pairs of constituent terms, each of which rep- 
resents a possible way to divvy up its parallel parts. For instance, we can split 
eo || e1 = (a || b) || c parallelly into a || b and c, but also into a and b || c, or into 
a || cand b. The definition below formalises this procedure. 


Definition 4.3. Lete E€ T; Ae is the smallest relation on T such that 


LAT LA, r LAT 
1A. e e^el £ Meteo T £ Mente, T lL Aer 


L Leo r e(e1) — 1 L Le, r e(eo) — 1 Lo Leo ro l Ae; rı 
4 Denes r 4 Aeri r Lo | by Aeoller ro | rı 


Given e € J, we refer to Ae as the parallel splitting relation of e, and to 
the elements of Ae as parallel splices of e. Before we can use A, to construct 
the preclosure of e, we go over a number of properties of the parallel splitting 
relation. The first of these properties is that a given e € T has only finitely many 
parallel splices. This will be useful later, when we involve all parallel splices of 
e in building a new term, i.e., to guarantee that the constructed term is finite. 


Lemma 4.4. Fore €T, Ae is finite. 


We furthermore note that the parallel composition of any parallel splice of e 
is ordered below e by Sgxq. This guarantees that parallel splices never contain 
extra information, i.e., that their semantics do not contain pomsets that do not 
occur in the semantics of e. It also allows us to bound the width of the parallel 
splices by the width of the term being split, as a result of Lemma 3.10. 


Lemma 4.5. Lete € T. Ifl Aer, then £ || r Seka e. 
Corollary 4.1. Lete € T. Ifl Aer, then l| + |r| < Jel. 


Finally, we show that Ae is dense when it comes to parallel pomsets, meaning 
that if we have a parallelly composed pomset in the semantics of e, then we can 
find a parallel splice where one parallel component is contained in the semantics 
of one side of the pair, and the other component in that of the other. 
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Lemma 4.6. Lete € T, and let V,W be pomsets such that V || W € [elska 
Then there exist £,r € T with £ Ae r such that V € [eleka and W €E [rloa 


Proof. The proof proceeds by induction on e. In the base, we can discount the 
case where e = 0, for then the claim holds vacuously. This leaves us two cases. 


- If e = 1, then V || W € [e],,, entails V || W = 1. By Lemma 3.1, we find 
that V = W = 1. Since 1 A, 1 by definition of Ae, the claim follows when we 
choose L= r = 1. 

- Ife = a for some a € X, then V || W € [e],,, entails V || W = a. By Lemma 
3.1, we find that either V = 1 and W = a, or V = a and W = 1. In the 
former case, we can choose £ = 1 and r = a, while in the latter case we can 
choose £ = a and r = 1. It is then easy to see that our claim holds in either 
case. 


For the inductive step, there are four cases to consider. 


~— If e = eọ + e1, then Uo || Ui € [ei], for some i € 2. But then, by induction, 
we find £,r € T with ¢ Ae, r such that V € [l], and W € [r] pa Since this 
implies that @ Ae r, the claim follows. 

— Ife = e9- e1, then there exist pomsets Uo, U such that V || W = Uo : U1, and 
U; € [ec]... for all i € 2. By Lemma 3.1, there are two cases to consider. 

e Suppose that U; = 1 for some i € 2, meaning that V || W = Uo - U, = 
Uii € [e1-:],,,, for this i. By induction, we find 4,r € T with £ Ae; r, 
and V € [£],,, as well as W € [r],,,,,. Since U; = 1 € feilska; we have that 
e(e;) = 1 by Lemma 3.9, and thus £ Ae r. 

Suppose that V = 1 or W = 1. In the former case, V | W = W = 

Uo: Uy € [e]eka: We then choose ¢ = 1 and r = e to satisfy the claim. 

In the latter case, we can choose £ = e and r = 1 to satisfy the claim 

analogously. 

— If e = eo || e1, then there exist pomsets Uo, Uı such that V || W = Uo || (i, 
and U; € [e:],,, for all i € 2. By Lemma 3.5, we find pomsets Vo, Vi, Wo, W1 
such that V = Vo | Vi, W = Wo | Wi, and U; = V; | W; for i € 2. For i € 2, 
we then find by induction ¢;,r; € T with 4; Ae, ri such that V; € [4],,, and 
Wi € [ri].,. We then choose £= 4o || 4 and r = ro || r1. Since V = Vo || V1, 
it follows that V € [£],,,, and similarly we find that W € [r]],,,. Since £ Ae r, 
the claim follows. 

— If e = eG, then there exist Up,U1,...,Un—1 €E [eo],,, such that V || W = 
Uo -U,-+-Un_1. If n = 0, i.e., V || W = 1, then V = W = 1. In that case, we 
can choose £ = e and r = 1 to find that £ Ae r, V € l. and W € [r] 
satisfying the claim. 

If n > 0, we can assume without loss of generality that, for O < i < n, it 
holds that U; 4 1. By Lemma 3.1, there are two subcases to consider. 

e Suppose that V,W +Æ 1; then n = 1 (for otherwise Uj = 1 for some 

0 < j < n by Lemma 3.1, which contradicts the above). Since V || 
W = Up € [eolaxa, we find by induction £,r € T with £ Ae, r such that 
V € [ek and W € [r],,,- The claim then follows by the fact that £ Ae r. 


BKA BKA’ 


BKA BKA? 
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e Suppose that V = 1 or W = 1. In the former case, V || W = W = 
Uo - U1 ++- Un—1 € [eleka We then choose ¢ = 1 and r = e to satisfy the 
claim. In the latter case, we can choose £ = e and r = 1 to satisfy the 
claim analogously. 


Example 4.3. Let U = a || c and V = b, and note that U || V € [eo || e1) 4. We 
can then find that a A, 1 and 1 A, b, and thus a || 1 Ae, 1 || b. Since also c A, 1, 
it follows that (a || 1) || c Acge, (1 I| b) || 1. We can then choose £ = (a || 1) || c 
and r = (1 || b) || 1 to find that U € [fleka and V € [r]].4, while £ Aggie, T- 


With parallel splitting in hand, we can define an operator on terms that 
combines all parallel splices of a parallel composition in a way that accounts for 
all of their downward closures. 


Definition 4.4. Let e,f € T, and suppose that, for every g E€ T such that 
lg| < lel + |f|, there exists a closure g}. The term e © f is defined as follows: 


eofêelf+ X aln 
CA sr 
l4|.Irl<lellf| 


Note that e© f is well-defined: the sum is finite since A, ¢ is finite by Lemma 
4.4, and furthermore £| and r] exist, as we required that |4|, |r| < |e || fl. 


Example 4.4. Let us compute eg © e1 and verify that we obtain a preclosure of 


eo || e1. Working through the definition, we see that A.,\\-, consists of the pairs 


(AIDIL@Io)e (GIDa il) |1, (a || 1) le 
(APO) e@iOi) KIÐŅÐILEIÐILY Kalde alb) lt) 


Since closure is invariant with respect to =<,,, we can simplify these terms by 
applying the axioms of CKA. After folding the unit subterms, we are left with 


(alblo (alld) = (bale)  @llca) (able)  (alļlc,b) 
Recall that a || b+a-b+ b-a is a closure of a || b. Now, we find that 


eo © e1 = (la || b) || c+c|| (a || b+a-b+b-a) 

+b || (a||c+ta-c+ce-a)+(b]|c+b-ct+e-b) lla 

+a (b||c+b-c+e-b)+(al|c+a-ct+e-a) || b 

=e 4 || b|| c+a || (b-ct+e-b)+b|| (a-c+c-a)+cl| (a-b+b-a) 


which was shown to be a preclosure of eọ || e; in Example 4.2. 


The general proof of correctness for © as a preclosure plays out as follows. 
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Lemma 4.7. Lete, f € T, and suppose that, for every g € T with |g| < |e|-+|f], 
there exists a closure g}. Thene © f is a preclosure of e || f. 


Proof. We start by showing that eO f =cxa e || f. First, note that e || f Sex, eOf 
by definition of e© f. For the other direction, suppose that €,r € T are such that 
L Aef T- By definition of closure, we know that 4} || r} =cxa £ || r. By Lemma 
4.5, we have £ || r Sexa e || f. Since every subterm of e © f is ordered below e || f 
by Scxa, we have that e © f Sea e || f. It then follows that e || f =a, €O f. 

For the second requirement, suppose that X € [e || fleka is non-sequential. 
We then know that there exists a Y € Je || flak, such that X E Y. This leaves 
us two cases to consider. 


BKA 


— If X is empty or primitive, then Y = X by Lemma 3.2, thus X € [e || f] 
By the fact that e || f Sexa e © f and by Lemma 3.8, we find X € fe © fleka 

— If X = Xo || Xi for non-empty pomsets Xo and Xj, then by Lemma 3.3 we 
find non-empty pomsets Yo and Yı with Y = Yọ || Yı such that X; E Y; for 
i € 2. By Lemma 4.6, we find £,r € T with £ Aef r such that Yo € Mleka 
and Yı € [r]; By Lemma 3.11, we find that |é|, |r| > 1. Corollary 4.1 then 
allows us to conclude that |4|, |r| < |e || fl. 
This means that £} || r} Sex, e © f. Since Xo € [41]. and X1 € [rl] a. by 
definition of closure, we can derive by Lemma 3.8 that 


BKA’ 


X = Xo | XE [4 | r Llera c [eo Flexa 


4.2 Closure 


The preclosure operator discussed above covers the non-sequential pomsets in 
the language [e || flexa; it remains to find a term that covers the sequential 
pomsets contained in [e || Fleka: 

To better give some intuition to the construction ahead, we first explore 
the observations that can be made when a sequential pomset W - X appears 
in the language [e || fleka; without loss of generality, assume that W is non- 
sequential. In this setting, there must exist U € [e],,, and V € [f]eka such that 
W-X CU || V. By Lemma 3.6, we find pomsets Up, U1, Vo, Vi such that 


W E Uo || Vo XTU |v U-U, CU V- VEV 


This means that Up - U1 € fe]eka and Vo: Vi € [f]eka Now, suppose we could 
find €0,€1; fo, fi € T such that 


€9°e1 Sn e Uo € Jeoleka U1 € Lead cca 
fo: fi Soa F Vo € [fole Vi € [filea 
Then we have W € [eo © folska and X € fea || fil... Thus, if we can find a 


closure of e1 || fı, then we have a term whose BKA-semantics contains W - X. 
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There are two obstacles that need to be resolved before we can use the obser- 
vations above to find the closure of e || f. The first problem is that we need to 
be sure that this process of splitting terms into sequential components is at all 
possible, i.e., that we can split e into eọ and e1 with e9-e1 Sa, € and U; € [ei], 
for i € 2. We do this by designing a sequential analogue to the parallel splitting 
relation seen before. The second problem, which we will address later in this 
section, is whether this process of splitting a parallel term e || f according to the 
exchange law and finding a closure of remaining term e1 || fı is well-founded, 
i.e., if we can find “enough” of these terms to cover all possible ways of sequen- 
tialising e || f. This will turn out to be possible, by using the fixpoint axioms of 
KA as in Sect. 3.3 with linear systems. 

We start by defining the sequential splitting relation.’ 


Definition 4.5. Lete € 7; Ve is the smallest relation on T such that 


LVegT LVe,r 
1Vı1 aVal Vaa Vet — Vesar Peas 


L Veo r £ Ve T Lo Veo ro l Ve T1 £ Veo r 
£ Veyer r-e eo: £ Vefe r Lo | Ly Veolle: To | rı eð -£ Ves r- ed 


Given e € T, we refer to Ve as the sequential splitting relation of e, and to the 
elements of Ve as sequential splices of e. We need to establish a few properties 
of the sequential splitting relation that will be useful later on. The first of these 
properties is that, as for parallel splitting, Ve is finite. 


Lemma 4.8. Fore €T, Ve is finite. 


We also have that the sequential composition of splices is provably below 
the term being split. Just like the analogous lemma for parallel splitting, this 
guarantees that our sequential splices never give rise to semantics not contained 
in the split term. This lemma also yields an observation about the width of 
sequential splices when compared to the term being split. 


Lemma 4.9. Lete €T. Iflr €T withl Ver, then l-r Sea e. 
Corollary 4.2. Letec T. Iflr ET withl Ver, then |4|, |r| < lel. 


Lastly, we show that the splices cover every way of (sequentially) splitting 
up the semantics of the term being split, i.e., that Ve is dense when it comes to 
sequentially composed pomsets. 


Lemma 4.10. Lete €T, and let V and W be pomsets such that V-W € [e] .,- 
Then there exist L,r ET with l Ver such that V € [ll], and W € [r] 


CKA CKA’ 


Proof. The proof proceeds by induction on e. In the base, we can discount the 
case where e = 0, for then the claim holds vacuously. This leaves us two cases. 


3 The contents of this relation are very similar to the set of left- and right-spines of a 
NetKAT expression as used in [5]. 
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— If e = 1, then V-W = 1; by Lemma 3.1, we find that V = W = 1. Since 
1 Ve 1 by definition of Ve, the claim follows when we choose ¢ = r = 1. 

— Ife =a for some a € X, then V-W =a; by Lemma 3.1, we find that either 
V =a and W = 1 or V = 1 and W = a. In the former case, we can choose 
L= a and r = 1 to satisfy the claim; the latter case can be treated similarly. 


For the inductive step, there are four cases to consider. 


—Ife = eọ + e;, then V -W € feil, for some i € 2. By induction, we find 
é,r E€ T with £ Ve, r such that V € [f], and W € [r],,,- Since £ Ve r in 
this case, the claim follows. 

— If e = eo: e1, then there exist Uo € [eo]... and Ui € [ei],,, such that 
V-W = Uo- U1. By Lemma 3.4, we find a series-parallel pomset X such that 
either V C Uo- X and X -W C Ui, or V -X C Uo and W E X - Uj. In the 
former case, we find that X : W € [ei],,,, and thus by induction (’,r € T 
with ¢’ Ve, r such that X € W]e, and W € [r].,,. We then choose £ = eo: V 
to find that £ Ve r, as well as V E Uo- X € [eolas: Jas = Heka, and 
thus V € [f]eka; The latter case can be treated similarly; here, we use the 
induction hypothesis on eo. 

— If e = e || e1, then there exist Up € [eo]... and Ui € fei]... such that 
V-W E Uo || U1. By Lemma 3.6, we find series-parallel pomsets Vo, Vi, Wo, Wi 
such that V C VY || Vi and W E Wo || Wi, as well as V; - W; E U; for all 
i € 2. In that case, V;-W; € [ei]... for all i € 2, and thus by induction we find 
liri E€ T with li Ve, ri such that Vi € Hileka and Wi € [Irileka We choose 
l= Lo | ey and r = To | rı to find that V € lo | rol and W € [4 | rıl 
as well as £ Ve r. 

— If e = ej, then there exist Up,Ui,...,Un_1 € [eo],,, such that V-W = 
Uo -U,---Un_-1. Without loss of generality, we can assume that for0 <i<n 
it holds that U; # 1. In the case where n = 0 we have that V-W = 1, thus 
V = W = 1, we can choose £= r = 1 to satisfy the claim. 

For the case where n > 0, we find by Lemma 3.4 an 0 < m < n and series- 
parallel pomsets X,Y such that X -Y C Um, and V E Up -U,---Um_1-X 
and W C Y - Um41 +: Um42:::Un. Since X- Y E Um € [eo], and thus 
X-Y € [eo] .,, we find by induction é’,r’ € T with V Ves r’ and X € [xa 
and Y € [r’],.,.. We can then choose £ = e§-¢’ and r = r- eġ to find that V C 
Uo -U, ke Um-1: X € Lead cca Kleka = Kleka and W = Y -Um4i'Um+2 nir Urn E 
Ilea: lelana = [r]es and thus that V € Mek and W E [r],,,. Since £ Ve r 
holds, the claim follows. 


CKA CKA* 


CKA CKA? 


CKA CKA’ 


Example 4.5. Let U be the pomset ca and let V be bc. Furthermore, let e be the 
term (a: b+c)“, and note that U - V € [e].,,. We then find that a Va 1, and 
thus a Va.» 1-b. We can now choose £ = (a -b+ c)“ -a and r = (1-b): (a-b + c)" 
to find that U € [4],,, and V € [r],,,, while £ Ve r. 


We know how to split a term sequentially. To resolve the second problem, 
we need to show that the process of splitting terms repeatedly ends somewhere. 
This is formalised in the notion of right-hand remainders, which are the terms 
that can appear as the right hand of a sequential splice of a term. 
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Definition 4.6. Lete € T. The set of (right-hand) remainders of e, written 
R(e), is the smallest satisfying the rules 
f E R(e) £ Vg r 
e € R(e) re R(e) 


Lemma 4.11. Lete €T. R(e) is finite. 


With splitting and remainders we are in a position to define the linear system 
that will yield the closure of a parallel composition. Intuitively, we can think of 
this system as an automaton: every variable corresponds to a state, and every row 
of the matrix describes the “transitions” of the corresponding state, while every 
element of the vector describes the language “accepted” by that state without 
taking a single transition. Solving the system for a least fixpoint can be thought 
of as finding an expression that describes the language of the automaton. 


Definition 4.7. Let e,f € T, and suppose that, for every g E€ T such that 
\g| < je| + |f|, there exists a closure g}. We choose 


Teg = {g || h: g € RE), he R(F)} 


The Ie p-vector pe, f and Ie -matriz Me ș are chosen as follows. 


pesg lh) = 9 | f Me slg ll hg hE So lon 
lgV gg 
LnV ph! 


Iep is finite by Lemma 4.11. We write Lep for the Ie p-linear system 
(Me, f,Pe,f)- 


We can check that Me + is well-defined. First, the sum is finite, because Vg 
and Vp are finite by Lemma 4.8. Second, if g || h € I and lg, frg, bn; rh E T 
such that lg Vg rg and ln, Vn Tn, then |£g| < |g] < |e| and |a| < |h] < |f] by 
Corollary 4.2, and thus, if d € T such that |d| < |44| + |én|, then |d| < lel + |f], 
and therefore a closure of d exists, meaning that lg © ln exists, too. 

The least solution to £e ¢ obtained through Lemma 3.12 is the J-vector 
denoted by se,. We write e Q f for se f(e || f), ie., the least solution at e || f. 

Using the previous lemmas, we can then show that e ® f is indeed a closure 
of e || f, provided that we have closures for all terms of strictly lower width. The 
intuition of this proof is that we use the uniqueness of least fixpoints to show 
that e || f =a, € Q f, and then use the properties of preclosure and the normal 
form of series-parallel pomsets to show that [e || Flaa = le 8 Fleka 


Lemma 4.12. Let e,f € T, and suppose that, for every g € T with |g| < 
le| + |f|, there exists a closure g|. Then e& f is a closure ofe || f. 
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Proof. We begin by showing that e || f =a, e€ Q f. We can see that pep is a 
solution to £e, f, by calculating for g || h € Ie g: 


(Hes ae Me, ` Pe,f)(9 | h) 
=gllh+ © ( X LO): (ry ll ra) (def. Me,r,Pe.s) 


rgllrnEl LgV grg 
hVhTh 


=x glkA D (Ola): (rg || ra) (distributivity) 
rg||lrnEl lg V grg 
LhVhTh 

=cxa g || R + > 5 (Lg | ln) $ (rg | Th) (Lemma 4.7) 
rgllrn€l Lg V grg 
lLnVntn 

Sea g || b+ > > (lg f a) | (Ln, Th) (exchange) 
rg|lrn€l lgV grg 
lLnVntn 

Saag ||h+ È 5X glih (Lemma 4.9) 
rg||lrnEl lg V grg 
Lh hTh 

=c g || h (idempotence) 

= pe, (g || h) (def. pe.p) 


To see that pe f is the least solution to Le f, let qe,f be a solution to £e, f. We 
then know that Me + + de,f + De,f Scxa qe,f; thus, in particular, pe,f Seka qe,f- 
Since the least solution to a linear system is unique up to =cxa, we find that 
Se,f =cka De,f, and therefore that e Q f = se ple || f) Scxa Pe sle || f) =e || f. 

It remains to show that if U € [e || Flexa then U € fe 8 fleka: To show this, 
we show the more general claim that if g || h € J and U € [g || hloka: then 
U € [se,¢(g || h)leka Write U = Uo -U,---Un_1i such that for 0 <i < n, U; is 
non-sequential (as in Corollary 3.1). The proof proceeds by induction on n. In the 
base, we have that n = 0. In this case, U = 1, and thus U € [g || Aleka by Lemma 
3.2. Since g || h = pe, f(g || h) Seka Se,p(g || h), it follows that U € [se,r(g || A)] 
by Lemma 3.8. 

For the inductive step, assume the claim holds for n—1. We write U = Uo- U’, 
with U’ = U1 - U2---Un_1. Since Up - U’ € [g || Alea; there exist W € [alas 
and X € [h], such that Uo -U’ E W || X. By Lemma 3.6, we find pomsets 
Wo, Wi, Xo, X1 such that Wo-W, E W and Xo-X1 E X, as well as Up E Wo || Xo 
and U’ C W; || X;. By Lemma 4.10, we find lg, rg, lh; rn E T with Lg Vg rg and 
ln Vn Th, such that Wo € Kolea: Wi € Poloa Xo € aleka and Xı € Palea 

From this, we know that Uo € [4 || alea and U” € [Tg || Ta] 4: Since Uo is 
non-sequential, we have that Uo € [¢, © Zalska; Moreover, by induction we find 
that U” € [se,¢(7g || rn) axa: Since lg © Ln Seka Me, f(g || 2,19 || Tn) by definition 
of Me, f, we furthermore find that 


BKA 


(Ly O bar): Se,f (Tg || ra) Seka Me,s(9 | h,Tg Il ra): Se,f (Tg Il ra) 


Since ry || ra € I, we find by definition of the solution to a linear system that 


Me,f(9 || sg Il Ta) Sef g || Th) Seca Se,f(G Il h) 
By Lemma 3.8 and the above, we conclude that U = Uo : U” € [se,f(g || )] 


BKA’ 
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For a concrete example where we find a closure of a (non-trivial) parallel 
composition by solving a linear system, we refer to Appendix A. 

With closure of parallel composition, we can construct a closure for any term 
and therefore conclude completeness of CKA. 


Theorem 4.1. Letec T. We can construct a closure e| ofe. 


Proof. The proof proceeds by induction on |e| and the structure of e, i.e., by 
considering f before g if |f| < |g|, or if f is a strict subterm of g (in which case 
|f| < |g| also holds). It is not hard to see that this induces a well-ordering on T. 
Let e be a term of width n, and suppose that the claim holds for all terms 
of width at most n — 1, and for all strict subterms of e. There are three cases. 


Ife = 0, e = 1 or e = a for some a € X, the claim follows from Lemma 4.2. 
— If e = eọ + €14, or e = e€9 : €1, or e = 6, the claim follows from Lemma 4.3. 
— If e = eo || e1, then eo @ e1 exists by the induction hypothesis. By Lemma 

4.12, we then find that eo © e1 is a closure of e. 


Corollary 4.3. Lete,f ET. If lelea = [flag then e =cxa f- 


Proof. Follows from Theorem 4.1 and Lemma 4.1. 


5 Discussion and Further Work 


By building a syntactic closure for each series-rational expression, we have shown 
that the standard axiomatisation of CKA is complete with respect to the CKA- 
semantics of series-rational terms. Consequently, the algebra of closed series- 
rational pomset languages forms the free CKA. 

Our result leads to several decision procedures for the equational theory of 
CKA. For instance, one can compute the closure of a term as described in the 
present paper, and use an existing decision procedure for BKA [3,12,20]. Note 
however that although this approach seems suited for theoretical developments 
(such as formalising the results in a proof assistant), its complexity makes it less 
appealing for practical use. More practically, one could leverage recent work by 
Brunet et al. [3], which provides an algorithm to compare closed series-rational 
pomset languages. Since this is the free concurrent Kleene algebra, this algorithm 
can now be used to decide the equational theory of CKA. We also obtain from 
the latter paper that this decision problem is EXPSPACE-complete. 

We furthermore note that the algorithm to compute downward closure can 
be used to extend half of the result from [14] to a Kleene theorem that relates the 
CKA-semantics of expressions to the pomset automata proposed there: if e € T, 
we can construct a pomset automaton A with a state q such that La(q) = [eleka 

Having established pomset automata as an operational model of CKA, a 
further question is whether these automata are amenable to a bisimulation-based 
equivalence algorithm, as is the case for finite automata [10]. If this is the case, 
optimisations such as those in [2] might have analogues for pomset automata 
that can be found using the coalgebraic method [23]. 
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While this work was in development, an unpublished draft by Laurence and 
Struth [19] appeared, with a first proof of completeness for CKA. The general 
outline of their proof is similar to our own, in that they prove that closure of 
pomset languages preserves series-rationality, and hence there exists a syntac- 
tic closure for every series-rational expression. However, the techniques used to 
establish this fact are quite different from the developments in the present paper. 
First, we build the closure via syntactic methods: explicit splitting relations and 
solutions of linear systems. Instead, their proof uses automata theoretic construc- 
tions and algebraic closure properties of regular languages; in particular, they 
rely on congruences of finite index and language homomorphisms. We believe 
that our approach leads to a substantially simpler and more transparent proof. 
Furthermore, even though Laurence and Struth do not seem to use any fun- 
damentally non-constructive argument, their proof does not obviously yield an 
algorithm to effectively compute the closure of a given term. In contrast, our 
proof is explicit enough to be implemented directly; we wrote a simple Python 
script (under six hundred lines) to do just that [16]. 

A crucial ingredient in this work was the computation of least solutions of 
linear systems. This kind of construction has been used on several occasions for 
the study of Kleene algebras [1,4,18], and we provide here yet another variation 
of such a result. We feel that linear systems may not have yet been used to their 
full potential in this context, and could still lead to interesting developments. 

A natural extension of the work conducted here would be to turn our atten- 
tion to the signature of concurrent Kleene algebra that includes a “parallel star” 
operator ell. The completeness result of Laurence and Struth [20] holds for BKA 
with the parallel star, so in principle one could hope to extend our syntactic 
closure construction to include this operator. Unfortunately, using the results of 
Laurence and Struth, we can show that this is not possible. They defined a notion 
of depth of a series-parallel pomset, intuitively corresponding to the nesting of 
parallel and sequential components. An important step in their development 
consists of proving that for every series-parallel-rational language there exists a 
finite upper bound on the depth of its elements. However, the language Lo" Toes 
does not enjoy this property: it contains every series-parallel pomset exclusively 
labelled with the symbol a. Since we can build such pomsets with arbitrary 
depth, it follows that there does not exist a syntactic closure of the term all. 
New methods would thus be required to tackle the parallel star operator. 

Another aspect of CKA that is not yet developed to the extent of KA is the 
coalgebraic perspective. We intend to investigate whether the coalgebraic tools 
developed for KA can be extended to CKA, which will hopefully lead to efficient 
bisimulation-based decision procedures [2,5]. 
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A Worked Example: A Non-trivial Closure 


In this appendix, we solve an instance of a linear system as defined in Defini- 
tion 4.7 for a given parallel composition. For the sake of brevity, the steps are 
somewhat coarse-grained; the reader is encouraged to reproduce the steps by 
hand. 

Consider the expression e || f = a* || b. The linear system £.,7 that we 
obtain from this expression consists of six inequations; in matrix form (with 
zeroes omitted), this system is summarised as follows:* 


1 || 1 1 1 
1 || b b 1 © b 
a-a*||1 a a* a- a* ' a-a 
a* || 1 1 a* a*-a l a* 
a-a |b| alb a a||b a-a*||b at a-a* | a-a ||b 
a* || b b 1 a*||b a-a*||b a* a-a* | a* || b 


Let us proceed under the assumption that x is a solution to the system; the 
constraint imposed on x by the first two rows is given by the inequations 


2(1 |] 1) +1 Sexe @(1 || 1) (1) 
b- (1 |] 1) +2(1 || b) +b Saa 2(1 |b) (2) 


Because these inequations do not involve the other positions of the system, we 
can solve them in isolation, and use their solutions to find solutions for the 
remaining positions; it turns out that choosing x(1 || 1) = 1 and z(1 || b) = b 
suffices here. 

We carry on to fill these values into the inequations given by the third and 
fourth row of the linear system. After some simplification, these work out to be 


a:a* +a-a*-a(a* || 1) + a*- a(a-a* || 1) <e z(a. a* || 1) (3) 


a* +a*-a-a(a* || 1) +a*-a(a-a* || 1) Seka x(a® || 1) (4) 


Applying the least fixpoint axiom to (3) and simplifying, we obtain 
a-a% +a-a*-x(a* || 1) Sea zla- a* || 1) (5) 
Substituting this into (4) and simplifying, we find that 


a* +a-a*-a(a* || 1) Sea z(a* || 1) (6) 


axiom. Plugging this back into (3) and simplifying, we find that 


This inequation, in turn, gives us that a* Sx, x(a* || 1) by the least fixpoint 


a-a" +a*-a(a-a* || 1) Sea zla- a* || 1) (7) 


t Actually, the system obtained from a* || b as a result of Definition 4.7 is slightly 
larger; it also contains rows and columns labelled by 1- a* || 1 and 1- a* || b; these 
turn out to be redundant. We omit these rows from the example for simplicity. 
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Again by the least fixpoint axiom, this tells us that a-a* Sexa x(a- a* || 1). One 


easily checks that x(a-a* || 1) = a- a* and z(a* || 1) = a* are solutions to (3) 
and (4); by the observations above, they are also the least solutions. 

It remains to find the least solutions for the final two positions. Filling in the 
values that we already have, we find the following for the fifth row: 


a || b+a-b+ (a* || b)-a-a* + (a-a®* || b)- a* 
+a*-x(a-a* || b) +a-a*-x(a* || b) +a-a* || b Seka z(a- a* || b) (8) 


Applying the exchange law° to the first three terms, we find that they are con- 
tained in (a- a* || b) - a*, as is the last term; (8) thus simplifies to 


(a-a* || b)-a* +a* - x(a- a* || b) +a-a*- x(a* || b) Sexa z(a - a* || b) (9) 
By the least fixpoint axiom, we find that 
a* - (a -a* || b) -a* +a-a*-ax(a* || b) Seka z(a- a* || b) (10) 
For the sixth row, we find that after filling in the solved positions, we have 
b+ b+ (a* || b) -a-a* + (a-a* || b)- a* 
+a* - x(a- a* || b) +a-a*-x(a* || b)+ a* || b Sexa x(a* || b) (11) 
Simplifying and applying the exchange law as before, it follows that 
(a* || b) : a* +a* - x(a- a* || b) +a: a* : x(a" || b) Saa w(a" |] b) 2) 
We then subsitute (10) into (12) to find that 
(a* || b): a* +a- a* - z(a* || b) Saxa z(a* || b) (13) 


which, by the least fixpoint axiom, tells us that a* - (a* || b) - a* Seka z(a* || b). 
Plugging the latter back into (9), we find that 


a* -(a-a* || b)-a* +a-a*-a*-(a* ||b)-a* Šek z(a- a* || b) (14) 
which can, using the exchange law, be reworked into 
a* - (a - a* || b) -a* Seka (a+ a* || b) (15) 


Now, if we choose z(a- a* || b) = a* - (a - a* || b) -a* and z(a* || b) = a* - (a* || 
b) - a*, we find that these choices satisfy (9) and (12)—making them part of a 
solution; by construction, they are also the least solutions. 

In summary, x is a solution to the linear system, and by construction it is 
also the least solution. The reader is encouraged to verify that our choice of 
x(a* || b) is indeed a closure of a* || b. 


5 A caveat here is that applying the exchange law indiscriminately may lead to a 
term that is not a closure (specifically, it may violate the semantic requirement in 
Definition 4.1). The algorithm used to solve arbitrary linear systems in Lemma 3.12 
does not make use of the exchange law to simplify terms, and thus avoids this pitfall. 
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Abstract. ORCA is a garbage collection protocol for actor-based pro- 
grams. Multiple actors may mutate the heap while the collector is run- 
ning without any dedicated synchronisation. ORCA is applicable to any 
actor language whose type system prevents data races and which sup- 
ports causal message delivery. We present a model of ORCA which is 
parametric to the host language and its type system. We describe the 
interplay between the host language and the collector. We give invariants 
preserved by ORCA, and prove its soundness and completeness. 


1 Introduction 


Actor-based systems are massively parallel programs in which individual actors 
communicate by exchanging messages. In such systems it is essential to be able 
to manage data automatically with as little synchronisation as possible. In pre- 
vious work [9,12], we introduced the ORCA protocol for garbage collection in 
actor-based systems. ORCA is language-agnostic, and it allows for concurrent 
collection of objects in actor-based programs with no additional locking or syn- 
chronisation, no copying on message passing and no stop-the-world steps. ORCA 
can be implemented in any actor-based system or language that has a type sys- 
tem which prevents data races and that supports causal message delivery. There 
are currently two instantiations of ORCA, one is for Pony [8,11] and the other 
for Encore [5]. We hypothesise that ORCA could be applied to other actor-based 
systems that use static types to enforce isolation [7,21, 28,36]. For libraries, such 
as Akka, which provide actor-like facilities, pluggable type systems could be used 
to enforce isolation [20]. 

This paper develops a formal model of ORCA. More specifically, the paper 
contributions are: 


1. Identification of the requirements that the host language must statically guar- 
antee; 
© The Author(s) 2018 
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2. Description and model of ORCA at a language-agnostic level; 

3. Identification of invariants that ensure global consistency without synchroni- 
sation; 

4. Proofs of soundness, i.e. live objects will not be collected, and proofs of com- 
pleteness, i.e. all garbage will be identified as such. 


A formal model facilitates the understanding of how ORCA can be applied 
to different languages. It also allows us to explore extensions such as shared 
mutable state across actors [40], reduction of tracing of immutable references [12], 
or incorporation of borrowing [4]. Alternative implementations of ORCA that 
rely on deep copying (e.g., to reduce type system complexity) across actors on 
different machines can also be explored through our formalism. 

Developing a formal model of ORCA presents challenges: 


Can the model be parametric in the host language? We achieved parametric- 
ity by concentrating on the effects rather than the mechanisms of the lan- 
guage. We do not model language features, instead, we model actor behaviour 
through non-deterministic choice between heap mutation and object creation. 
All other actions, such as method call, conditionals, loops etc., are irrelevant. 

Can the model be parametric in the host type system? We achieved parametricity 
by concentrating on the guarantees rather than the mechanism afforded by the 
type system. We do not define judgments, but instead, assume the existence 
of judgements which determines whether a path is readable or writeable from 
a given actor. Through an (uninterpreted) precondition to any heap muta- 
tion, we require that no aliasing lets an object writeable from an actor be 
readable/writeable from any other actor. 

How to relax atomicity? ORCA relies on a global invariant that relates the number 
of references to any data object and the number of messages with a path to 
that object. This invariant only holds if actors execute atomically. Since we 
desire actors to run in parallel, we developed a more subtle, and weaker, 
definition of the invariant. 


The full proofs and omitted definitions are available in appendix [16]. 


2 Host Language Requirements 


ORCA makes some assumptions about its host language, we describe them here. 


2.1 Actors and Objects 


Actors are active entities with a thread of control, while objects are data struc- 
tures. Both actors and objects may have fields and methods. Method calls on 
objects are synchronous, whereas method calls on actors amount to asynchronous 
message sends—they all called behaviours. Messages are stored in a FIFO queue. 
When idle, an actor processes the top message from its queue. At any given point 
of time an actor may be either idle, executing a behaviour, or collecting garbage. 


Correctness of a Concurrent Object Collector for Actor Languages 887 


"bed bod 


Fig. 1. Actors and objects. Full arrows are references, grey arrows are overwritten 
references: references that no longer exist. 


Actor| Path |Capapability || Actor| Path |Capability 


this. fı write this. f2 tag 

this. fi. fs write this. fo. fs li 

œ this. f3 read this. f4 read 
this. fe write 


this. fe. fs write 


Fig. 2. Capabilities. Heap mutation may modify what object is reachable through a 
path, but not the path’s capability. 


Figure1 shows actors a; and a2, objects wı to w4. In [16] we show how 
to create this object graph in Pony. In Fig. 1(a), actor a; points to object w1 
through field fı to wə through field fs, and object wı points to ws through field 
fs. In Fig. 1(b), actor a creates w4 and assigns it to this. fı. fs. In Fig. 1(c), a1 
has given up its reference to wı and sent it to actz which stored it in field fg. 
Note that the process of sending sent not only w; but also implicitily w4. 


2.2 Mutation, Transfer and Accessibility 


Message passing is the only way to share objects. This falls out of the capability 
system. If an actor shares an object with another actor, then either it gives up 
the object or neither actor has a write capability to that object. For example, 
after a; sends w1 to a, it cannot mutate w1. As a consequence, heap mutation 
only decreases accessibility, while message sends can transfer accessibility from 
sender to receiver. When sending immutable data the sender does not need to 
transfer accessibility. However, when it sends a mutable object it cannot keep 
the ability to read or to write the object. Thus, upon message send of a mutable 
object, the actor must consume, or destroy, its reference to that object. 


2.3 Capabilities and Accessibility 


ORCA assumes that a host language’s type system assigns access rights to paths. 
A path is a sequence of field names. We call these access rights capabilities. 

We expect the following three capabilities; read, write, tag. The first two allow 
reading and writing an object’s fields respectively. The tag capability only allows 
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identity comparison and sending the object in a message. The type system must 
ensure that actors have no read-write races. This is natural for actor languages [5, 
* 11,21). 

Figure 2 shows capabilities assigned to the paths in Fig. 1: a1.f1.fs5 has capa- 
bility write, thus a; can read and write to the object reachable from that path. 
Note that capapabilities assigned to paths are immutable, while the contents of 
those paths may change. For example, in Fig. 1(a), a1 can write to w3 through 
path fı.fs, while in Fig.1(b) it can write to w4 through the same path. In 
Fig. 1(a) and (b), ag can use the address of wı but cannot read or write it, 
due to the tag capability, and therefore cannot access w3 (in Fig. 1(a)) nor w4 
(in Fig. 1(b)). However, in Fig. 1(c) the situation reverses: a2, which received w1 
with write capability is now able to reach it through field fg, and therefore w4. 
Notice that the existence of a path from an actor to an object does not imply 
that the object is accessible to the actor: In Fig. 1(a), there is a path from az to 
w3, but a2 cannot access w3. Capabilities protect against data races by ensuring 
that if an object can be mutated by an actor, then no other actor can access its 
fields. 


2.4 Causality 


ORCA uses messages to deliver protocolrelated information, it thus requires 
causal delivery. Messages must be delivered after any and all messages that 
caused them. Causality is the smallest transitive relation, such that if a message 
m’ is sent by some actor after it received or sent m, then m is a cause of m’. 
Causal delivery entails that m’ be delivered after m. 

For example, if actor a; sends mı to actor a2, then sends mz to actor as, 
and a3 receives mz and sends m3 to a2, then mı is a cause of m2, and mz is 
a cause of m3. Causal delivery requires that a2 receive mı before receiving m3. 
No requirements are made on the order of delivery to different actors. 


3 Overview of ORCA 


We introduce ORCA and discuss how to localise the necessary information to 
guarantee safe deallocation of objects in the presence of sharing. Every actor 
has a local heap in which it allocates objects. An actor owns the objects it has 
allocated, and ownership is fixed for an object’s life-time, but actors are free to 
reference objects that they do not own. Actors are obligated to collect their own 
objects once these are no longer needed. While collecting, an actor must be able 
to determine whether an object can be deallocated using only local information. 
This allows all other actors to make progress at any point. 


3.1 Mutation and Collection 


ORCA relies on capabilities for actors to reference objects owned by other actors 
and to support concurrent mutation to parts of the heap that are not being 
concurrently collected. Capabilities avoid the need for barriers. 
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Iı An object accessible with write capability from an actor is not accessible with 
read or write capability from any other actor. 


This invariant ensures an actor, while executing garbage collection, can safely 
trace any object to which it has read or write access without the need to protect 
against concurrent mutation from other actors. 


3.2 Local Collection 


An actor can collect its objects based on local information without consulting 
other actors. For this to be safe, the actor must know that an owned, locally 
inaccessible, object is also globally inaccessible (7.e., inaccessible from any other 
actors or messages)'. Shared objects are reference counted by their owner to 
ensure: 


I2 An object accessible from a message queue or from a non-owning actor has 
reference count larger than zero in the owning actor. 


Thus, a locally inaccessible object with a reference count of 0 can be collected. 


3.3 Messages and Collection 


I, and Iz are sufficient to ensure that local collection is safe. Maintaining Ig is not 
trivial as accessibility is affected by message sends. Moreover, it is possible for an 
actor to share a read object with another actor through a message. What if that 
actor drops its reference to the object? The object’s owner should be informed 
so it can decrease its reference count. What happens when an actor receives 
an object in a message? The object’s owner should be infomed, so that it can 
increase its reference count. To reduce message traffic, ORCA uses distributed, 
weighted, deferred reference counts. Each actor maintains reference counts that 
tracks the sharing of its objects. It also maintains counts for “foreign objects”, 
tracking references to objects owned by other actors. This reference count for 
non-owning actors is what allows sending/receiving objects without having to 
inform their owner while maintaining Ig. For any object or actor 4, we denote 
with LRC(z) the reference count for ų in v’s owner, and with FRC(v) we denote 
the sum of the reference counts for z in all other actors. The counts do not reflect 
the number of references, rather the existence of references: 


I; If a non-owning actor can access an object through a path from its fields or 
call stack, its reference count for this object is greater than 0. 


An object is globally accessible if it is accessible from any actor or from a message 
in some queue. Messages include reference increment or decrement messages— 
these are ORCA-level messages and they are not visible to applications. We 
introduce two logical counters: AMC(z) to account for the number of application 


1 For example, in Fig. 1(c) w4 in is locally inaccessible, but globally accessible. 
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Catalin 


1. Andy allocates object œ 


2. Andy ——« ——> Bart 
3. Bart —— o —> Catalin 


4. Bart === dec(w) => Andy 


Fig. 3. Black arrows are references, numbered in creation order. Blue solid arrows are 
application messages and blue dashed arrows ORCA-level message. (Color figure online) 


messages with paths to 1, and OMC(v) to account for ORCA-level messages 
with reference count increment and decrement requests. These counters are not 
present at run-time, but they will be handy for reasoning about ORCA. The 
owner’s view of an object is described by the LRC and the OMC, while the 
foreign view is described by the FRC and the AMC. These two views must 
agree: 


IL Vu. LRC(e) + OMC(z) = AMC (1) + FRC(v) 


I2, Is and I4 imply that a locally inaccessible object with LRC = 0 can be 
reclaimed. 


3.4 Example 


Consider actors Andy, Bart and Catalin, and steps from Fig. 3. 


Initial State. Let w be a newly allocated object. As it is only accessible to its 
owning actor, Andy, there is no entry for it in any RC. 


Sharing w. When Andy shares w with Bart, w is placed on Bart’s message queue, 
meaning that AMC(w) = 1. This is reflected by setting RCangy(w) to 1. This 
preserves I, and the other invariants. When Bart takes the message with w 
from his queue, AMC(w) becomes zero, and Bart sets his foreign reference count 
for w to 1, that is, RCgar(w) = 1. When Bart shares w with Catalin, we get 
AMC(w) = 1. To preserve I4, Bart could set RCgar(w) to 0, but this would 
break Ig. Instead, Bart sends an ORCA-level message to Andy, asking him to 
increment his (local) reference count by some n, and sets his own RCgar(w) to 
n.” This preserves I4 and the other invariants. When Catalin receives the message 
later on, she will behave similarly to Bart in step 2, and set RCcatalin(w) = 1. 

The general rule is that when an actor sends one of its objects, it increments 
the corresponding (local) RC by 1 (reflecting the increasing number of foreign 
references) but when it sends a non-owned object, it decrements the correspond- 
ing (foreign) RC (reflecting a transfer of some of its stake in the object). Special 
care needs to be taken when the sender’s RC is 1. 


? This step can be understood as if Bart “borrowed” n units from Andy, added n — 1 
to his own RC, and gave 1 to the AMC, to reach Catalin eventually. 
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Further note that if Andy, the owner of w, received w, he would decrease 
his counter for w rather than increase it, as his reference count denotes foreign 
references to w. When an actor receives one of its owned objects, it decrements 
the corresponding (local) RC by 1 but when it receives a non-owned object, it 
increments the corresponding (foreign) RC by 1. 


Dropping References to w. Subsequent to sharing w with Catalin, Bart performs 
GC, and traces his heap without reaching w (maybe because it did not store w 
in a field). This means that Bart has given up his stake in w. This is reflected 
by sending a message to Andy to decrease his RC for w by n, and setting Bart’s 
RC for w to 0. Andy’s local count of the foreign references to w are decreased 
piecemeal like this, until LRC(w) reaches zero. At this point, tracing Andy’s local 
heap can determine if w should be collected. 


Further Aspects. We briefly outline further aspects which play a role in ORCA. 


Concurrency. Actors execute concurrently. For example, sharing of w by Bart 
and Catalin can happen in parallel. As long as Bart and Catalin have foreign 
references to w, they may separately, and in parallel cause manipulation of the 
global number of references to w. These manipulations will be captured locally 
at each site through FRC, and through increment and decrement messages 
to Andy (OMC). 

Causality. Increment and decrement messages may arrive in any order. Andy’s 
queue will serialise them, i.e. concurrent asynchronous reference count manip- 
ulations will be ordered and executed sequentially. Causality is key here, as it 
prevents ORCA-level messages to be overtaken by application messages which 
cause RCs to be decremented; thus causality keeps counters non-negative. 

Composite Objects. Objects message must be traced to find the transitive 
closure of accessible data. For example, when passing w; in a message in 
Fig. l(c), objects accessible through it, e.g., w4 will be traced. This is man- 
dated by Is and I4. 


Finally, we reflect on the nature of reference counts: they are distributed, in the 
sense that an object’s owner and every actor referencing it keep separate counts; 
weighted, in that they do not reflect the number of aliases; and deferred, in that 
they are not manipulated immediately on alias creation or destruction, and that 
non-local increments/decrements are handled asynchronously. 


4 The ORCA Protocol 


We assume enumerable, disjoint sets ActorAddr and ObjAddr, for addresses of 
actors and objects. The union of the two is the set of addresses including null. 
We require a mapping Class that gives the name of the class of each actor in a 
given configuration, and a mapping O that returns the owner of an address 


Addr = ActorAddr& ObjAddr w {null} 
Class: Config x ActorAddr — ClassId 
O : Addr — ActorAddr 
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such that the owner of an actor is the actor itself, i.e., Va € ActorAddr. O(a) = a. 

Definition 1 describes run-time configurations, C. They consist of a heap, x, 
which maps addresses and field identifiers to addresses, and an actor map, as, 
from actor addresses to actors. Actors consist of a frame, a queue, a reference 
count table, a state, a working set, marks, and a program counter. Frames are 
either empty, or consist of the identifier for the currently executing behaviour, 
and a mapping from variables to addresses. Queues are sequences of messages. 
A message is either an application message of the form app(@) denoting a high- 
level language message with the frame ¢, or an ORCA message, of the form 
orca(v: z), denoting an in-flight request for a reference count change for ı by z. 
The state distinguishes whether the actor is idle, or executing some behaviour, 
or performing garbage collection. We discuss states, working sets, marks, and 
program counters in Sect.4.3. We use naming conventions: a € ActorAddr; w € 
ObjAddr, ı € Addr; z € Z; n € IN; b € Bld; x € Varld; A € ClassId; and is for a 
sequence of addresses 41...t,. We write C.heap for C’s heap; and a.que, or a.rce, 
or a.framec, or a.stc for the queue, reference count table, frame or state of actor 
a in configuration C, respectively. 


Definition 1 (Runtime entities and notation) 


CE Config = Heap x Actors 
x€ Heap = (Addr \ {null}) x FId — Addr 
as E€ Actors = ActorAddr — Actor 
a€ Actor = Frame x Queue x ReferenceCounts 


x State x Workset x Marks x PC 
@€ Frame = Ý U (BId x LocalMap) 


w € LocalMap = Varld — Addr 
qE Queue = Message” 
m E€ Message ::= orca(t: z) | app(d) 
rc€ ReferenceCounts = Addr — IN 


State, Workset, Marks, and PC described in Definition 7. 


Example: Figure 4 shows Co, our running example for a runtime configuration. 
It has three actors: a;—a3, represented by light grey boxes, and eight objects, 
W 1-Wg, represented by circles. We show ownership by placing the objects in 
square boxes, e.g. O(w7) = a1. We show references through arrows, e.g. we 
references wg through field f7, that is, Co.heap(we, fr) = ws. The frame of az 
contains behaviour identifier b’, and maps x’ to wg. All other frames are empty. 
The message queue of a; contains an application message for behaviour b and 
argument ws for x, the queue of ag is empty, and the queue of a3 an ORCA 
message for w7. The bottom part shows reference count tables: a1.rcc, (a1) = 21, 


3 Note that we omitted the class of objects. As our model is parametric with the type 
system, we can abstract from classes, and simplify our model. 
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and @1.rc¢,(w7) = 50. Entries of owned addresses are shaded. Since az owns ag 
and w2, the entries for a2.1rc¢,(a@2) and a2.rc¢, (w2) are shaded. Note that a; has 
a non-zero entry for w7, even though there is no path from a; to wy. There is 
no entry for w1; no such entry is needed, because no actor except for its owner 
has a path to it. The 0 values indicate potentially non-existent entries in the 
corresponding tables; for example, the reference count table for actor a3 needs 
only to contain entries for a1, a3, w3, and w4. Ownership does not restrict access 
to an address: e.g. actor a; does not own object w3, yet may access it through the 
path this. f,.fo.f3, may read its field through this. f,.fo.f3.f4, and may mutate 
it, e.g. by this. fy. f2. f3 = this. fy. 


Lookup of fields in a configuration is defined in the obvious way, i.e. 


Definition 2. C(v.f) =C.heap(v, f), and C(u.f.f’) = C.heap(C(v.f, fO) 


4.1 Capabilities and Accessibility 
ORCA considers three capabilities: 
k € Capability = {read, write, tag}, 


where read allows reading, write allows reading and writing, and tag forbids 
both read and write, but allows the use of an object’s address. To describe the 
capability at which objects are visible from actors we use the concepts of static 
and dynamic paths. 


Static paths consist of the keyword this (indicating a path starting at the current 
actor), or the name of a behaviour, b, and a variable, x, (indicating a path 
starting at local variable x from a frame of b), followed by any number of fields, f. 


sp ::= this | b.x | sp.f 


Heap Frames 
gi Oe A | es | a1.frame=0 
fi T = fit $ 
f i eH az.frame= (b', x’ ws) 
on fy a3.frame=0 
Po 
fs f: 
© io 
| | 
Reference Count Tables Queues 
Q1 | &2 | a3 || w2 | w3 | Wa | w5 | we | w7 | ws Q1.qu=app(b, r= ws) 
Q&ı.rc: | 4 | 1 10 49 60 LU | 2 0 I 50 B) az.qu =) 
ag.rc:| 0 | 2 | 20 | ve | 100| O 1 1 0 
as.rc:| 20| 0 P| 0 POD io; 0| ol of o| %3-qu=0:orca(wr, —50) 


Fig. 4. Configuration Co. wi is absent in the ref. counts, it has not been shared. 
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The host language must assign these capabilities to static paths. Thus, we 
assume it provides a static judgement of the form 


AF sp: k where A € ClassId 


meaning that a static path sp has capability capability when “seen” from a class A. 
We highlight static judgments, i.e., those provided by the type system in blue. 

We expect the type system to guarantee that read and write access rights are 
“deep”, meaning that all paths to a read capability must go through other read 
or write capabilities (A1), and all paths to a write capability must go through 
write capabilities (A2). 


Axiom 1 For class identifier A, static path sp, field f, capability k, we 
assume: E 

Al AF sp.f:k — JK Atag. AF sp: kr’. 

A2 At sp.f:write —> AF sp: write. 


Such requirements are satisfied by many type systems with read-only refer- 
ences or immutability (e.g. [7, 11,18, 23, 29,33, 37,41]). An implication of A1 and 
A2 is that capabilities degrade with growing paths, i.e., the prefix of a path has 
more rights than its extensions. More precisely: AF sp: k and A F sp.f : K' 
imply that « < «’, where we define write < read < tag, and k < x’ iff k =k’ or 
k< K. 


Example: Table 1 shows capabilities for some paths from Fig. 4. Thus, A; F 
this. fı : write, and A H b'.x’ : write, and A» F this. fg : tag. The latter, together 
with Al gives that Ag 7 this. fs.f : k for all « and f. 

As we shall see later, the existence of a path does not imply that the path 
may be navigated. For example, Co(a2.fg.f4) = wa, but actor a2 cannot access 
w4 because of Ag F this. fg : tag. 

Moreover, it is possible for a path to have a capability, while not being 
defined. For example, Table1 shows A, F this.f,.fo : write and it would be 
possible to have C;(a1.f1) = null, for some configuration C; that derives from Co. 


Table 1. Capabilities for paths, where A; = Class(a1) and Az = Class(az). 


ClassId|Path Capability 

this. fı write 
this. fi. f2 write 
this. fi. fo. fs write ClassId| Path Capability 
this. f1. f2. f3-fa Itag this. fs tag 

Aj š Az E : 
b.x write bax write 
Orcas write 
b.a. fs.f7 tag 
Dre iseiiG write 
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Dynamic paths (in short paths p) start at the actor’s fields, or frame, or at some 
pending message in an actor’s queue (the latter cannot be navigated yet, but 
will be able to be navigated later on when the message is taken off the queue). 
Dynamic paths may be local paths (lp) or message paths. Local paths consist of 
this or a variable x followed by any number of fields f. In such paths, this is the 
current actor, and x is a local variable from the current frame. Message paths 
consist of k.x followed by a sequence of fields. If k > 0, then k.x indicates the 
local variable x from the k-th message from the queue; k = —1 indicates variables 
from either (a) a message that has been popped from the queue, but whose frame 
has not yet been pushed onto the stack, or (b) a message whose frame has been 
created but not yet been pushed onto the queue. Thus, k = —1 indicates that 
either (a) a frame will be pushed onto the stack, during message receiving, or 
(b) a message will be pushed onto the queue during message sending. 


p € Path:= lp | mp lp ::= this | x | lp.f mp ::= k.x | mp.f 


We define accessibility as the lookup of a path provided that the capability for 
this path is defined. The partial function A returns a pair: the address accessible 
from actor a following path p, and the capability of œ on p. A path of the form 
p.owner returns the owner of the object accessible though p and capability tag. 


Definition 3 (accessibility). The partial function 
A: Config x ActorAddr x Path — (Addr x Capability) 
is defined as 


Ac(a, this. f) = (1, K) iff Claf)=+1 A Class(a)F this fik 

Ac(a,x.f) = (u, 4) iff Ibay. | a.framec = (b,y) A C((x).f) =e 
7 A Class(a) F b.a. fk | 

Acla, k.x.f) = (4,5) if k>0 A Aba. | a.quelk] = app(b,w) A 


C(x). f) =e A Cassa) brf: | 
Acla, —1.x.f) = (1, ) iff a is executing Sending or Receiving, and ... 
continued in Definition 9. 
Ac(a, p.owner) = (a’,tag) iff Av.[Ac(a,p)=(t,-) A O(L) =a") 


We use Ac(a,p) = ı as shorthand for 3x. Ac (a, p) = (1, K). The second and third 
case above ensure that the capability of a message path is the same as when the 
message has been taken off the queue and placed on the frame. 


Example: We obtain that Ac,(a1, this. fi.fo.f3) = (w3,write), from the fact 
that Fig.4 says that Co(ai.fi.fo.fs) = w3 and from the fact that Table 1 
says that A, F this. fi.fo.f3 : write. Similarly, Ac, (az, this. fg) = (w3,tag), and 
Ac, (a2, 2") = (wg, write), and Ac, (a1, 0.x. fs. f7) = (ws, tag). 

Both Ac, (a1, this. f1. fo.f3), and Ac, (a2, this. fg) describe paths from actors’ 
fields, while Ac,(a@2,x’) describes a path from the actor’s frame, and finally 
Ac, (a1, 0.x. fs. f7) is a path from the message queue. 

Accessibility describes what may be read or written to: Ac, (a1, this. f1. fo. f3) = 
(w3, write), therefore actor a; may mutate object wg. However, this mutation is not 
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visible by ag, even though Co(az.fg) = w3, because Ag, (a2, this. fg) = (ws, tag), 
which means that actor a2 has only opaque access to w3. 

Accessibility plays a role in collection: If the reference f3 were to be dropped it 
would be safe to collect w4; even though there exists a path from a2 to w4; object 
w4 is not accessible to ag: the path this. fg.f4 leads to w4 but will never be nav- 
igated (Ac, (a2, this. fs. f4) is undefined). Also, Ac (a2, this. fg.owner) = (a3, tag); 
thus, as long as w4 is accessible from some actor, e.g. through C(a2.fg) = wa, 
actor a3 will not be collected. 


Because the class of an actor as well as the capability attached to a static path 
are constant throughout program execution, the capabilities of paths starting 
from an actor’s fields or from the same frame are also constant. 


Lemma 1. For actor a, fields f, behaviour b, variable x, fields f, capabilities 
k, &’, configurations C and C', such that C reduces to C’ in one or more steps: 


- Ac(a,this.f) = (1,6) A Aer(a,this.f) = (',K) — =ø 
- Ac(a,x.f)=(4,4) A Ael(a,a.f) = (UK) A 
a.framec = (b,-) ^A a.framec, = (b,_) — K=k' 


4.2 Well-Formed Configurations 


We characterise data-race free configurations (F C $): 


Definition 4 (Data-race freedom). FC iff 
Va, al, p, p', K, Kh. 
aa’ NA Ac(a,p)=(4,6) A Ac(a’,p’) = (u) 
= 
Kw kl 
where we define 
kak iff | (K=write — K'=tag) A (K = write — «=tag) | 


This definition captures invariant Iı. The remaining invariants depend on 
the four derived counters introduced in Sect.3. Here we define LRC and FRC, 
and give a preliminary definition of AMC and OMC. 


Definition 5 (Derived counters—preliminary for AMC andss OMC) 
LRCe(t) = O(e).rce(e) 
FRCe(t) = X. arcel) 


aZ~O(t) 


E z if O(t).que|j] = orca(ı : z) NS 
OMCe() = Vi; f anaes + ..¢f.Definition 12 


AMCe(t) = #{ (a,k) | k>0A Aa.f.Ac(a,k.x.f) = }+ ...¢f.Definition 12 


where # denotes cardinality. 
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For the time being, we will be reading this preliminary definition as if ... stood 
for 0. This works under the assumption the procedures are atomic. However 
Sect. 5.3, when we consider fine-grained concurrency, will refine the definition 
of AMC and OMC so as to also consider whether an actor is currently in the 
process of sending or receiving a message from which the address is accessible. 
For the time being, we continue with the preliminary reading. 


Example: Assuming that in Co none of the actors is sending or receiving, we 
have LRC¢,(w3) = 160, and FRCe, (w3) = 160, and OMCc, (w3) = 0, and 
AMCg, (w3) = 0. Moreover, AMC¢, (we) = AMCe,(a2) = 1: neither wg nor a2 
are arguments in application messages, but they are indirectly reachable through 
the first message on @,’s queue. 


A well-formed configuration requires: I;—-I4: introduced in Sect.3; Is: the 
RC’s are non-negative; Ig: accessible paths are not dangling; I7: processing mes- 
sage queues will not turn RC’s negative; Ig: actors’ contents is in accordance 
with their state. The latter two will be described in Definition 14. 


Definition 6 (Well-formed configurations—preliminary). F C, iff for all 
a, Ao, L, l', p, lp, and mp, such that ap = O(t) £ a: 


Ty EC Q% 

Tg | Ac(a,p)=e V Ac(ao,mp)=t | — LRCe(z)>0 
Iz Ac(a, lp) = + — a.rce(z) > 0 

Iy LRCe(e) + OMCe(t) = FRCe(z) + AMCe(v) 

I5 a.rce(e’) > 0 

Ig Acla, p) =e — C.heap(e) AL 

I7, Ig description in Definition 14. 


For ease of notation, we take Is to mean that if a.rce(v’) is defined, then it 
is positive. And we take any undefined entry of a.rc¢(z) to be 0. 


4.3 Actor States 


We now complete the definition of runtime entities (Definition 1), and describe 
the states of an actor, the worksets, the marks, and program counters. (Defini- 
tion 7). We distinguish the following states: idle (IDLE), collecting (COLLECT), 
receiving (RECEIVE), sending a message (SEND), or executing the synchronous 
part of a behaviour (EXECUTE). We discuss these states in more detail next. 

Except for the idle state, IDLE, all states use auxiliary data structures: work- 
sets, denoted by ws, which stores a set of addresses; marks maps, denoted by 
ms, from addresses to R (reachable) or U (unreachable), and program coun- 
ters. Frames are relevant when in states EXECUTE, or SEND, and otherwise 
are assumed to be empty. Worksets are used to store all addresses traced from 
a message or from the actor itself, and are relevant when in states SEND, or 
RECEIVE, or COLLECT, and otherwise are empty. Marks are used to calculate 
reachability and are used in state COLLECT, and are ignored otherwise. The 
program counters record the instruction an actor will execute next; they range 
between 4 and 27 and are ghost state, i.e. only used in the proofs. 
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aa Receive = 
pmm 


Idle 4 Execute Send 
Bi a 
F 
a ra 
Sg — Collect 


Fig. 5. State transitions diagram for an actor. 


Definition 7 (Actor States, Working sets, and Marks) 


st € State ::= IDLE | EXECUTE | SEND | RECEIVE | COLLECT 
ws € Workset = P(Addr) 
ms€ Marks = Addr — {R, U} 
pc € PC = [4..27] 


We write a.stc, or @.wSc, or a.msc, or a.pce for the state, working set, marks, 
or the program counter of a in C, respectively. 

Actors may transition between states. The state transitions are depicted in 
Fig. 5. For example, an actor in the idle state (IDLE) may receive an orca message 
(remaining in the same state), receive an app message (moving to the RECEIVE 
state), or start garbage collection (moving to the COLLECT state). 

In the following sections we describe the actions an actor may perform. Fol- 
lowing the style of [17,26,27] we describe actors’ actions through pseudo-code 
procedures, which have the form: 


procedure_name(a): 
condition 
= 


{ instructions } 


We let a denote the executing actor, and the left-hand side of the arrow 
describes the condition that must be satisfied in order to execute the instructions 
on the arrow’s right-hand side. Any actor may execute concurrently with other 
actors. To simplify notation, we assume an implicit, globally accessible config- 
uration C. Thus, instruction a.state:=EXECUTE is short for updating the state 
of a in C to be EXECUTE. We elide configurations when obvious, e.g. a.frame = 
¢ is short for requiring that in C the frame of a is ¢, but we mention them when 
necessary—e.g. F C[t1, f + t2] > expresses that the configuration that results 
from updating field f in 1; is data-race free. 


Tracing Function. Both garbage collection, and application message sending /re- 
ceiving need to find all objects accessible from the current actor and/or from the 
message arguments. We define two functions: trace_this finds all addresses which 
are accessible from the current actor, and trace_frame finds all addresses which 
are accessible through a stack frame (but not from the current actor, this). 
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GarbageCollection(a): 
a.st = IDLE V ast = EXECUTE 


1 

2 

3 

4 

5 ast := COLLECT 

6 ams :=@ 

í 

8 // marking as unreachable 

9 forall ¿ with a = O(t) V a.rc(e) >0 do a.ms := a.ms[z +> U] 
10 


11 // tracing and marking locally accessible as reachable 
12 forall € trace_this(a@) U trace_frame(a.frame) do a.ms := a.ms[e +> R] 


14 // marking owned and globally accessible as reachable 
15 forall ¿ with a = O(t) A a.rc(t) >0 do a.ms := a.ms[t +> R] 


17 // collecting 
18 forall ¿ with a.ms(v) = U do 
19 if O(c) = a then 


20 C.heap := C.heap[i > L] 

a1 are := a.rc[er> L] 

22 else 

23 O(v).qu.push(orca(e:—a.rc(z))) 

24 are := a.rc[er l] 

25 

- } if a.frame=@ then a.st := IDLE else a.st := EXECUTE 


Fig. 6. Pseudo-code for garbage collection. 


Definition 8 (Tracing). We define the functions 
trace_this : Configx ActorAddr — P (Addr) 
trace_frame : Configx ActorAddrx Frame —> P(Addr) 
as follows 
trace_thisc (œ) ={ı| Af. Ac(a, this. f) =c} 
trace_framec (a, ¢) ={1 | Ix € dom(¢), f. Ac(a, x.f) =r} 


4.4 Garbage Collection 


We describe garbage collection in Fig.6. An idle, or an executing actor (pre- 
condition on line 2) may start collecting at any time. Then, it sets its state to 
COLLECT (line 5), and initialises the marks, ms, to empty (line 6). 

The main idea of ORCA collection is that the requirement for global unreacha- 
bility of owned objects can be weakened to the local requirement to local unreach- 
ability and a LRC = 0. Therefore, the actor marks all owned objects, and all 
addresses with a RC > 0 as U (line 9). After that, it traces the actor’s fields, 
and also the actor’s frame if it happens not to be empty (as we shall see later, 
idle actors have empty frames) and marks all accessible addresses as R (line 12). 
Then, the actor marks all owned objects with RC > 0 as R (line 15). Thus we 
expect that: (*) Any ı with ms(v) =U is locally unreachable, and if owned by the 
current actor, then its LRCis 0. For each address with ms(v) = U, if the actor 
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owns 0, then it collects it (line 20)—this is sound because of Ig, I3, I4 and (*). If 
the actor does not own 2, then it asks v’s owner to decrement its reference count 
by the current actor’s reference count, and deletes its own reference count to it 
(thus becoming 0) (line 24)—this preserves I2, Is and I4. 

There is no need for special provision for cycles across actor boundaries. 
Rather, the corresponding objects will be collected by each actor separately, 
when it is the particular actor’s turn to perform GC. 


Example: Look at the cycle ws—wg, and assume that the message app(b, ws) 
had finished execution without any heap mutation, and that ay.rce(w5) = 
ay.1Ce(we) = 1 = az.rce (w5) = a9.1rce (wg )—this will be the outcome of the exam- 
ple in Sect.4.5. Now, the objects ws and we are globally unreachable. Assume 
that a1 performs GC: it will not be able to collect any of these objects, but it will 
send a orca(wg :—1) to ag. Some time later, œz will pop this message, and some 
time later it will enter a GC cycle: it will collect wg, and send a orca(ws :—1) to 
a,. When, later on, a; pops this message, and later enters a GC cycle, it will 
collect ws. 


At the end of the GC cycle, the actor sets is state back to what it was before 
(line 26). If the frame is empty, then the actor had been IDLE, otherwise it had 
been in state EXECUTE. 


4.5 Receiving and Sending Messages 


Through message send or receive, actors share addresses with other actors. This 
changes accessibility. Therefore, action is needed to re-establish Ig and I; for all 
the objects accessible from the message’s arguments. 

Receiving application messages is described by Receiving in Fig. 7. It requires 
that the actor a is in the IDLE state and has an application message on top of 
its queue. The actor sets its state to RECEIVE (line 5), traces from the message 
arguments and stores all accessible addresses into ws (line 7). Since accessibility 
is not affected by other actors’ actions, c.f., last paragraph in Sect.4.6 it is 
legitimate to consider the calculation of trace_frame as one single step. It then 
pops the message from its queue (line 8), and thus the AMC for all the addresses 
in ws will decrease by 1. To preserve I4, for each z in its ws, the actor: 


— if it is v’s owner, then it decrements its reference count for ų by 1, thus decreas- 
ing LRC¢(z) (line 12). 

— if it is not vs owner, then it increments its reference count for . by 1, thus 
increasing FRC¢(v) (line 14). 


After that, the actor sets its frame to that from the message (line 17), and goes 
to the EXECUTE state (line 18). 


Example: Actor a, has an application message in its queue. Assuming that 
it is IDLE, it may execute Receiving: It will trace ws and as a result store 
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Receiving(q): 
a.st = IDLE A a.qu.top() = app(¢) 
a 


a.ws := trace_frame(a, ¢) 


1 
2 
3 
4{ 
a.st := RECEIVE 
7 
8  pop(a.qu) 


foreach 1 € a.ws do 
if a = O(z) then 


a.rc(t) -= 1 
else 
a.re(e) += 1; 


a.ws := aws \ {x} 


a.frame := ¢ 
a.st := EXECUTE 


BRR RE RRR RRR 
OOBNDWUBWNrROYO 


ReceiveORCA (a): 
a.state = IDLE A a.qu.top() = ORCA(z : z) 


Fig. 7. Receiving application and ORCA messages. 


{w5, w6, Wg, 1, v2} in its ws. It will then decrement its reference count for ws and 
a, (the owned addresses) and increment it for the others. It will then pop the 
message from its queue, create the appropriate frame, and go to state EXECUTE. 


Receiving ORCA messages is described in Fig. 7. An actor in the IDLE state 
with an ORCA message at the top, pops the message from its queue, and adds 
the value z to the reference count for 1, and stays in the IDLE state. 

Sending application messages is described in Fig. 8. The actor must be in the 
EXECUTE state for some behaviour b and must have local variables which can 
be split into w and ~’—the latter will form part of the message to be sent. As 
the AMC for all the addresses reachable through the message increases by 1, in 
order to preserve I4 for each address 1 in ws, the actor: 


— increments its reference count for ų¿ by 1, if it owns it (line 14); 

— decrements its reference count for + if it does not own it (line 16). But special 
care is needed if the actor’s (foreign) reference count for is 1, because then 
a simple decrement would break Is. Instead, the actor set its reference count 
for ¿ by 256 (line 18) and sends an ORCA message to v’s owner with 256 as 
argument. 


After this, it removes ’ from its frame (line 22), pushes the message 
app(b’, Y’) onto a’’s queue, and transitions to the EXECUTE state. 
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1 Sending(a): 

2 ast = EXECUTE ^ a.frame = (b,~-7') A 

3 Va € dom(w), 2’ € dom(') Vn, K Vf, f’. | 

4 | |lACGeir) = GP) AAG EP) =] Ge) — 6 Hel A 
5 | Class(a) F br f: k —> Classa) F ba’ f’: R] ] 

6 > 

TA 

8 ast := SEND 


10 a.ws := trace_frame(a, (b, v’)) 


12 foreach ų € a.ws do 
13 if a = O(1) then 


14 a.rc(t) += 1 

15 elseif a.rc(v) > 1 then 

16 a.rc(t) -= 1 

17 else 

18 a.rc(t) := 256 

19 O(t).qu.push(orca(e : 256)) 
20 a.ws := a.ws\{z} 

21 


22 a.frame := (b, 4%) 
23 a’.qu.push(app(b’, Y")) 
25 — ast := EXECUTE 


Fig. 8. Pseudo-code for message sending. 


We now discuss the preconditions. These ensure that sending the message 
app(b, Y’) will not introduce data races: Line 4 ensures that there are no data 
races between paths starting at w and paths starting at w’, while Line 5 ensures 
that the sender, a, and the receiver, a’ see all the paths sent, i.e. those starting 
from (b’, Y’), at the same capability. We express our expectation that the source 
language compiler produces code only if it satisfies this property by adding this 
static requirement as a precondition. These static requirements imply that after 
the message has been sent, there will be no races between paths starting at the 
sender’s frame and those starting at the last message in the receiver’s queue. In 
more detail, after the sender’s frame has been reduced to (b, Y), and app(b’, Y”) 
has been added to the receiver’s queue (at location k), we will have a new con- 
figuration C’=C[a, frame + (b, ~)][a’, queue +> a’ .queuee :: (b’,~’)]. In this new 
configuration lines 4 and 5 ensure that Ac (a, x. f) = (t, K) A Ac (a’, ka’. f) = 
(K) — K’ ~ k, which means that if there were no data races in C, there will 
be no data races in C’ either. Formally: = C > EC'S. 

We can now complete Definition3 for the receiving and the sending cases, 
to take into account paths that do not exist yet, but which will exist when the 
message receipt or message sending has been completed. 
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Definition 9 (accessibility—receiving and sending). Completing Defini- 
tion 3: Ac(a,—l.a.f) = (1,6) iff 
aste = Receiving ^A 9<a.pce <18 A C((x).f) =. A Class(a)F ba. f ik 
where (b, y)is the frame popped at line 8, 
or 
a.ste = Sending A a.pce = 23 A C(y'(x).f) =. A Class(a’/) Pb af 2k 
where a’ is the actor to receive the app-message, and 
(b',w’) is the frame to be sent in line 23. 


Example: When actor a; executes Receiving, and its program counter is 
between 9 and 18, then Ac, (ai, —1.x.fs) = (we,write), even though x is 
not yet on the stack frame. As soon as the frame is pushed on the stack, 
and we reach program counter 20, then t Ac,(ai,—1.2.fs) is undefined, but 
Ae, (a1, £. f5) = (we, write). 


4.6 Actor Behaviour 


As our model is parametric with the host language, we do not aim to describe 
any of the actions performed while executing behaviours, such as synchronous 
method calls and pushing frames onto stacks, conditionnals, loops etc. Instead, 
we concentrate on how behaviour execution may affect GC; this happens only 
when the heap is mutated either by object creation or by mutation of objects’ 
fields (since this affects accessibility). In particular, our model does not accom- 
modate for recursive calls; we claim that the result from the current model 
would easily be extended to a model with recursion in synchronous behaviour, 
but would require a considerable notation overhead. 

Figure9 shows the actions of an actor œ while in the EXECUTE state, i.e. 
while it executes behaviours synchronously. The description is nondeterministic: 
the procedures Goldle, or Create, or MutateHeap, may execute when the corre- 
sponding preconditions hold. Thus, we do not describe the execution of a given 
program, rather we describe all possible executions for any program. In Goldle, 
the actor a simply passes from the execution state to the idle state; the only 
condition is that its state is EXECUTE (line 2). It deletes the frame, and sets the 
actor’s state to IDLE (line 4). Create creates a new object, initialises its fields to 
null, and stores its address into local variable x. 

The most interesting procedure is field assignment, MutateHeap. line 8 
modifies the object at address 4;, reachable through local path Ipl, and 
stores in its field f the address ¿2 which was reachable through local path 
lp2. We require that the type system makes the following two guaran- 
tees: line 2, second conjunct, requires that lp1 should be writable, while line 
3 requires that lp2 should be accessible. Line 4 and line 5 requite that 
capabilities of objects do not increase through heap mutation: any address that 
is accessible with a capability « after the field update was accessible with 
the same or more permissive capability «’ before the field update. This 
requirment guarantees preservation of data race freedom, i.e. that F C > implies 


E Clu, f > ta] >. 
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Goldle(a): 
a.st = EXECUTE 


— 
{ a.frame := 0; a.st := IDLE; } 


Create(a): 
a.st = EXECUTE ^ freshw A O(w) =a 
> 


{ 

10 heap := 

11 heap[w +> (fi > null, ..., fr > null)} 
12 a.frame := a.frame[z +> u] 

13 } 


OoN DAWN e 


1 MutateHeap(a): 

2 ast = EXECUTE ^ Ac(a,lpl) = (t1, write) 
3 ^A Ac(a, lp2) = t2 

4 ^ MA E = (aie) = 
5 (3K',lp' Ac(a, Ip’) = (K) AK < K J) 
6 > 

TA 

8 

9} 


heap := heap[41, f > 12] 
Fig. 9. Pseudo-code for synchronous operations. 


Heap Mutation Does not Affect Accessibility in Other Actors. Heap mutation 
either creates new objects, which will not be accessible to other actors, or modi- 
fies objects to which the current actor has write access. By F C > all other actors 
have only tag access to the modified object. Therefore, because of capabilities’ 
degradation with growing paths (as in A1 and A2), no other actor will be able 
to access objects reachable through paths that go through the modified object. 


5 Soundness and Completeness 


In this section we show soundness and completeness of ORCA. 


5.1 I; and Iz Support Safe Local GC 


As we said earlier, I4 and Iz support safe local GC. Namely, Iı guarantees that as 
long as GC only traces objects to which the actor has read or write access, there 
will be no data races with other actors’ behaviour or GC. And Ig guarantees 
that collection can take place based on local information only: 


Definition 10. For a configuration C, and object address w we say that 


- w is globally inaccessible in C, iff Va,p.Ae(a,p) £ w 
- w is collectable, iff LRCc(w) = 0, and Vip. Ac(O(w), lp) 4 w. 


Lemma 2. If Iz holds, then every collectable object is globally inaccessible. 
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5.2 Completeness 


In [16] we show that globally inaccessible objects remain so, and that for any 
globally inaccessible object there exists a sequence of steps which will collect it. 


Theorem 1 (Inaccessibility is monotonic). For any configurations C, and 
C’, if C’ is the outcome of the execution of any single line of code from any of 
the procedures from Figs. 6, 7, 8 and 9, and w is globally inaccessible in C, then 
w is globally inaccessible in C’. 


Theorem 2 (Completeness of ORCA). For any configuration C, and object 
address w which is globally inaccessible in C, there exists a finite sequence of 
steps which lead to C’ in which w ¢ dom(C’). 


5.3 Dealing with Fine-Grained Concurrency 


So far, we have discussed actions under an assumption atomicity. However, ORCA 
needs to work under fine-grained concurrency, whereby several actors may be exe- 
cuting concurrently, each of them executing a behaviour, or sending or receiving 
a message, or collecting garbage. With fine-grained concurrency, and with the 
preliminary definitions of AMC and OMC, the invariants are no longer preserved. 
In fact, they need never hold! 


Example: Consider Fig. 4, and assume that actor a1 was executing Receiving. 
Then, at line 7 and before popping the message off the queue, we have 
LRC(ws5) = 2, FRC(ws) = 1, AMO” (w5) = 1, where AMC?(_) stands for the 
preliminary definition of AMC; thus I4 holds. After popping and before updat- 
ing the RC for ws, i.e. between lines 9 and 11, we have AMC?(ws) = 0—thus I4 
is broken. At first sight, this might not seem a big problem, because the update 
of RC at line 12 will set LRC (w5) = 1, and thus restore I4. However, if there was 
another message containing ws in a@2’s queue, and consider a snapshot where ag 
had just finished line 8 and a, had just finished line 12, then the update of a ,’s 
RC will not restore I4. 


The reason for this problem is, that with the preliminary definition AMC? (_), 
upon popping at line 8, the AMC is decremented in one atomic step for all objects 
accessible from the message, while the RC is updated later on (at line 12 or line 
14), and one object at a time. In other words, the updates to AMC and LRC are 
not in sync. Instead, we give the full definition of AMC so, that AMC is in sync 
LRC; namely it is not affected by popping the message, and is reduced one object 
at a time once we reach program counter line 15. Similarly, because updating 
the RC’s takes place in a separate step from the removal of the ORCA-message 
from its queue, we refine the definition of OMC: 


Definition 11 (Auxiliary Counters for AMC, and OMC) 
AMCo"(t) = #{a | aste=RECEIVE ^ 9<a.pce A 
1 € a.ws\CurrAddrRcvuc(a)} 


{tio} if &.pCe = 15 


CurrAddrRcve(a) = fi otheriise 
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In the above a.ws refers to the contents of the variable ws while the actor a is 
executing the pseudocode from Receiving, and tio refers to the contents of the 
variable ı arbitrarily chosen in line 10 of the code. 

We define AMC#"4(1), OMC (1), and OMC8"4(1) similarly in [16]. 


The counters AMC and AMC8"4 are zero except for actors which are 
in the process of receiving or sending application messages. Also, the counters 
OMC" and AMC" are zero except for actors which are in the process of 
receiving or sending ORCA-messages. All these counters are always > 0. We can 
now complete the definition of AMC and OMC: 


Definition 12 (AMC and OMC - full definition) 


OMCe (v) = 5, Z if O(2) auc] = orca(/ : z) 4 OMC (4) 2 OMCE"() 
1 10 otherwise 


AMC¢(t) = #{ (a,k) | k>0 A Sx-F.Ac(a, k£. F) =e } + AMCE (i) + AMCE"(c) 


where # denotes cardinality. 


Example: Let us again consider that a, was executing Receiving. Then, at line 
10 we have ws = {t5, 46} and AMC(ws) = 1 = AMC(we). Assume at the first 
iteration, at line 10 we chose t5, then right before reaching line 15 we have 
AMC(ws) = 0 and AMC(we) = 1. At the second iteration, at line 10 we will 
chose vg, and then right before reaching 15 we have AMC (wẹ) = 0. 


5.4 Soundness 


To complete the definition of well-formed configurations, we need to define what 
it means for an actor or a queue to be well-formed. 


Well-Formed Queues - I7. The owner’s reference count for any live address 
(i.e. any address reachable from a message path, or foreign actor, or in an ORCA 
message) should be greater than 0 at the current configuration, as well as, at all 
configurations which arise from receiving pending, but no new, messages from 
the owner’s queue. Thus, in order to ensure that ORCA decrement messages do 
not make the local reference count negative, Iy requires that the effect of any 
prefix of the message queue leaves the reference count for any object positive. 
To formulate Iy we use the concept of QueueEffectc(a,l,n), which describes 
the contents of LRC after the actor a has consumed and reacted to the first n 
messages in its queue—i.e. is about “looking into the future”. Thus, for actor a, 
address 4, and number n we define the effect of the n-prefix of the queue on the 
reference count as follows: 


QueueEffectc(a,t,n) = LRCe(t) — z + Sy Weighte(a, t, 7) 


where z = k, if a is in the process of executing ReceiveORCA, and a.pcce = 6, 
and a.qu.top = orca (4: k), and otherwise z = 0. 


Correctness of a Concurrent Object Collector for Actor Languages 907 


And where, 
z! if a.que|y] = orca(u: 2’) 
Weightc(a,t,j) = 4-1 if drdf. Acla, k.x.f)=1^ Ol) = 


0 otherwise 


Iy makes the following four guarantees: [a] The effect of any prefix of the 
message queue leaves the LRC non-negative. [b] If ¿ is accessible from the j- 
th message in its owner’s queue, then the LRC for ųı will remain >0 during 
execution of the current message queue up to, and including, the j-th message. 
[c] If v is accessible from an ORCA-message, then the LRC will remain >0 during 
execution of the current message queue, up to and excluding execution of the 
ORCA-message itself. [d] If 1 is globally accessible (i.e. reachable from a local 
path or from a message in a non-owning actor) then LRC(z) is currently >0, 
and will remain so after during popping of all the entries in the current queue. 


Definition 13 (17). EQueues C, iff for all j € N, for all addresses ı, actors a, 


a’, where O(t) =a #0’, the following conditions hold: 

a Vn. QueueEffecte(a,t,n) > 0 

b ax. Sf. Ac(a,j.a.f) =t — Vk < j. QueueEffecte(a,t,k) > 0. 
c a.que|j] = orca(ı : z) — Vk < j. QueueEffectc(a,1,k) > 0. 

d 3p.Ac(a', p) =+ — Vk EN. QueueEffectela, ıı, k) > 0. 


For example, in a configuration with LRC(v) = 2, and a queue with orca(v: 
—2) :: orca(ı : —1) :: orca( : 256) is illegal by I7.[a]. Similarly, in a configuration 
with LRC (4) = 2, and a queue with orca(: : —2) :: orca(u : 256), the owning actor 
could collect 1 beore popping the message orca(. : 256) from its queue. Such a 
configuration is also deemed illegal by I7.[c]. 


Ig-Well-Formed Actor. In [16] we define well-formedness of an actor a through 
the judgement C,a F st. This judgement depends on a’s current state st, and 
requires, among other things, that the contents of the local variables ws, ms are 
consistent with the contents of the pc and RC. Remember also, that because 
Receiving and Sending modify the ws or send ORCA-messages before updating 
the frame or sending the application message, in the definition of AMC and OMC 
we took into account the internal state of actors executing such procedures. 


Well-Formed Configuration. The following completes Definition6 from 
Sect. 4.2. 


Definition 14 (Well-formed configurations—full). A configuration C is 
well-formed, F C, iff Iy-Ig (Definition 6) for C, if its queues are well-formed 
(EQueues C, Iz), as well as, all its actors (C, a H a.stc, Ig). 


n [16] we consider the execution of each line in the codes from Sect. 4, and 
prove: 


Theorem 3 (Soundness of ORCA). For any configurations C and C': IfF C, 
and C’ is the outcome of the execution of any single line of code from any of the 
procedures from Figs. 6, 7, 8 and 9, then EC’. 
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This theorem together with Ig implies that ORCA never leaves accessible paths 
dangling. Note that the theorem is stated so as to be applicable for a fine inter- 
leaving of the execution. Even though we expressed ORCA through procedures, 
in our proof we cater for an execution where one line of any of these procedures 
is executed interleaved with any other procedures in the other actors. 


6 Related Work 


The challenges faced when developing and debugging concurrent garbage col- 
lectors have motivated the development of formal models and proofs of correct- 
ness [6, 13, 19,30,35]. However, most work considers a global heap where mutator 
and collector threads race for objects and relies on synchronisation mechanisms 
(or atomic reduction steps), such as read or write barriers, in contrast to ORCA 
which considers many local heaps, no atomicity or synchronization, and relies on 
the properties of the type system. McCreight et al. [25] introduced a framework 
to reason about and build certified garbage collectors, verifying independently 
both mutator and collector threads. Their work focuses mainly on garbage collec- 
tors similar to those that run on Java programs, such as STW mark-and-sweep, 
STW copying and incremental copying. Vechev et al. [39] specified concurrent 
mark-and-sweep collectors with write barriers for synchronisation. The authors 
also present a parametric garbage collector from which other collectors can be 
derived. Hawblitzel and Petrank [22] mechanized proofs of two real-world collec- 
tors (copying and mark-and-sweep) and their respective allocators. The assembly 
code was instrumented with pre- and post-conditions, invariants and assertions, 
which were then verified using Z3 and Boogie. Ugawa et al. [38] extended a 
copying, on-the-fly, concurrent garbage collector to process reference types. The 
authors model-checked their algorithm using a model that limited the number 
of objects and threads. Gamie et al. [17] machine-checked a state-of-the-art, on- 
the-fly, concurrent, mark-and-sweep garbage collector [32]. They modelled one 
collector thread and many mutator threads. ORCA does not limit the number of 
actors running concurrently. 

Local heaps have been used in the context of garbage collection to reduce the 
amount of synchronisation required before [1-3, 13,15, 24,31,34], where different 
threads have their own heap and share a global heap. However, only two of these 
have been proved correct. Doligez and Gonthier [13] proved a collector [14] which 
splits the heap into many local heaps and one global heap, and uses mark-and- 
sweep for individual collection of local heaps. The algorithm imposes restrictions 
on the object graph, that is, a thread cannot access objects in other threads’ 
local heaps. ORCA allows for references across heaps. Raghunathan et al. [34] 
proved correct a hierarchical model of local heaps for functional programming 
languages. The work restricted objects graphs and prevented mutation. 

As for collectors that rely on message passing, Moreau et al. [26] revisited the 
Birrell’s reference listing algorithm, which also uses message passing to update 
reference counts in a distributed system, and presented its formalisation and 
proofs or soundness and completeness. Moreover, Clebsch and Drossopoulou [10] 
proved correct MAC, a concurrent collector for actors. 
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7 Conclusions 


We have shown the soundness and completeness of the ORCA actor memory 
reclamation protocol. The ORCA model is not tied to a particular programming 
language and is parametric in the host language. Instead it relies on a number 
of invariants and properties which can be met by a combination of language and 
static checks. The central property that is required is the absence of data races 
on objects shared between actors. 

We developed a formal model of ORCA and identified requirements for the 
host language, its type system, or associated tooling. We described ORCA at a 
language-agnostic level and identified eight invariants that capture how global 
consistency is obtained in the absence of synchronisation. We proved that ORCA 
will not prematurely collect objects (soundness) and that all garbage will be 
identified as such (completeness). 
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Abstract. Lamport’s Paxos algorithm is a classic consensus protocol 
for state machine replication in environments that admit crash failures. 
Many versions of Paxos exploit the protocol’s intrinsic properties for 
the sake of gaining better run-time performance, thus widening the gap 
between the original description of the algorithm, which was proven cor- 
rect, and its real-world implementations. In this work, we address the 
challenge of specifying and verifying complex Paxos-based systems by (a) 
devising composable specifications for implementations of Paxos’s single- 
decree version, and (b) engineering disciplines to reason about protocol- 
aware, semantics-preserving optimisations to single-decree Paxos. In a 
nutshell, our approach elaborates on the deconstruction of single-decree 
Paxos by Boichat et al. We provide novel non-deterministic specifications 
for each module in the deconstruction and prove that the implementa- 
tions refine the corresponding specifications, such that the proofs of the 
modules that remain unchanged can be reused across different implemen- 
tations. We further reuse this result and show how to obtain a verified 
implementation of Multi-Paxos from a verified implementation of single- 
decree Paxos, by a series of novel protocol-aware transformations of the 
network semantics, which we prove to be behaviour-preserving. 


1 Introduction 


Consensus algorithms are an essential component of the modern fault-tolerant 
deterministic services implemented as message-passing distributed systems. In 
such systems, each of the distributed nodes contains a replica of the system’s 
state (e.g., a database to be accessed by the system’s clients), and certain nodes 
may propose values for the next state of the system (e.g., requesting an update 
in the database). Since any node can crash at any moment, all the replicas have 
to keep copies of the state that are consistent with each other. To achieve this, 
at each update to the system, all the non-crashed nodes run an instance of a 
consensus protocol, uniformly deciding on its outcome. The safety requirements 
for consensus can be thus stated as follows: “only a single value is decided uni- 
formly by all non-crashed nodes, it never changes in the future, and the decided 
value has been proposed by some node participating in the protocol” [16]. 


© The Author(s) 2018 
A. Ahmed (Ed.): ESOP 2018, LNCS 10801, pp. 912-939, 2018. 
https: //doi.org/10.1007/978-3-319-89884-1_32 


Paxos Consensus, Deconstructed and Abstracted 913 


The Paxos algorithm [15,16] is the classic consensus protocol, and its single- 
decree version (SD-Paxos for short) allows a set of distributed nodes to reach an 
agreement on the outcome of a single update. Optimisations and modifications 
to SD-Paxos are common. For instance, the multi-decree version, often called 
Multi-Paxos [15,27], considers multiple slots (7.e., multiple positioned updates) 
and decides upon a result for each slot, by running a slot-specific instance of an 
SD-Paxos. Even though it is customary to think of Multi-Paxos as of a series of 
independent SD-Paxos instances, in reality the implementation features multi- 
ple protocol-aware optimisations, exploiting intrinsic dependencies between sep- 
arate single-decree consensus instances to achieve better throughput. To a great 
extent, these and other optimisations to the algorithm are pervasive, and veri- 
fying a modified version usually requires to devise a new protocol definition and 
a proof from scratch. New versions are constantly springing (cf. Sect.5 of [27] 
for a comprehensive survey) widening the gap between the description of the 
algorithms and their real-world implementations. 

We tackle the challenge of specifying and verifying these distributed algo- 
rithms by contributing two verification techniques for consensus protocols. 

Our first contribution is a family of composable specifications for Paxos’ 
core subroutines. Our starting point is the deconstruction of SD-Paxos by 
Boichat et al. [2,3], allowing one to consider a distributed consensus instance 
as a shared-memory concurrent program. We introduce novel specifications for 
Boichat et al.’s modules, and let them be non-deterministic. This might seem 
as an unorthodox design choice, as it weakens the specification. To show that 
our specifications are still strong enough, we restore the top-level deterministic 
abstract specification of the consensus, which is convenient for client-side rea- 
soning. The weakness introduced by the non-determinism in the specifications 
has been impelled by the need to prove that the implementations of Paxos’ 
components refine the specifications we have ascribed [9]. We prove the refine- 
ments modularly via the Rely/Guarantee reasoning with prophecy variables and 
explicit linearisation points [11,26]. On the other hand, this weakness becomes a 
virtue when better understanding the volatile nature of Boichat et al.’s abstrac- 
tions and of the Paxos algorithm, which may lead to newer modifications and 
optimisations. 

Our second contribution is a methodology for verifying composite consensus 
protocols by reusing the proofs of their constituents, targeting specifically Multi- 
Paxos. We distill protocol-aware system optimisations into a separate semantic 
layer and show how to obtain the realistic Multi-Paxos implementation from SD- 
Paxos by a series of transformations to the network semantics of the system, 
as long as these transformations preserve the behaviour observed by clients. We 
then provide a family of such transformations along with the formal conditions 
allowing one to compose them in a behaviour-preserving way. 

We validate our approach for construction of modularly verified consensus 
protocols by providing an executable proof-of-concept implementation of Multi- 
Paxos with a high-level shared memory-like interface, obtained via a series of 
behaviour-preserving network transformations. The full proofs of lemmas and 
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P1iA(1)_ P1BCok,1,0) _ P2A(v1,1) P2B(ok) 


N1:¢ A 
> P1A(3) : P2A(v1,3) 
NO P1A(1) /P1B(ok, 1,0) P2A(v1,1) /P2B(ok) 
` P1A(3) \P1B(ok,v1,1) P2A(v1,3) \P2B(ok) 
: P1A(1) . P2A(v1,1) 
N3:- 


*PiA(3) | P1B(ok, 1,0) . P2A(v1 ,3) P2B (ok) 


Fig. 1. A run of SD-Paxos. 


theorems from our development, as well as some boilerplate definitions, are given 
in the appendices of the supplementary extended version of this paper.! 


2 The Single-Decree Paxos Algorithm 


We start with explaining SD-Paxos through an intuitive scenario. In SD-Paxos, 
each node in the system can adopt the roles of proposer or acceptor, or both. A 
value is decided when a quorum (i.e., a majority of acceptors) accepts the value 
proposed by some proposer. Now consider a system with three nodes N1, N2 and 
N3, where N1 and N3 are both proposers and acceptors, and N2 is an acceptor, 
and assume N1 and N3 propose values vı and v3, respectively. 

The algorithm works in two phases. In Phase 1, a proposer polls every accep- 
tor in the system and tries to convince a quorum to promise that they will later 
accept its value. If the proposer succeeds in Phase 1 then it moves to Phase 2, 
where it requests the acceptors to fulfil their promises in order to get its value 
decided. In our example, it would seem in principle possible that N1 and N3 could 
respectively convince two different quorums—one consisting of N1 and N2, and 
the other consisting of N2 and N3—to go through both phases and to respec- 
tively accept their values. This would happen if the communication between N1 
and N3 gets lost and if N2 successively grants the promise and accepts the value 
of N1, and then does the same with N3. This scenario breaks the safety require- 
ments for consensus because both vı and v3—which can be different—would get 
decided. However, this cannot happen. Let us explain why. 

The way SD-Paxos enforces the safety requirements is by distinguishing each 
attempt to decide a value with a unique round, where the rounds are totally 
ordered. Each acceptor stores its current round, initially the least one, and only 
grants a promise to proposers with a round greater or equal than its current 
round, at which moment the acceptor switches to the proposer’s round. Figure 1 
depicts a possible run of the algorithm. Assume that rounds are natural numbers, 
that the acceptors’ current rounds are initially 0, and that the nodes N1 and 
N3 attempt to decide their values with rounds 1 and 3 respectively. In Phase 1, 
N1 tries to convince a quorum to switch their current round to 1 (messages 
P1A(1)). The message to N3 gets lost and the quorum consisting of N1 and 
N2 switches round and promises to only accept values at a round greater or 


1 Find the extended version online at https://arxiv.org/abs/1802.05969. 
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1 val vP := undef; 
Pazos 


2 proposeP(val vO) { 

3 ( assume(!(v0 = undef)); 
4 if (vP = undef) { 

5 vP := vO; 

6 } return vP; ) } 


Round-Based Consensus 


Round-Based Register 


Fig. 2. Deconstruction of SD-Paxos (left) and specification of module Pazos (right). 


equal than 1. Each acceptor that switches to the proposer’s round sends back to 
the proposer its stored value and the round at which this value was accepted, 
or an undefined value if the acceptor never accepted any value yet (messages 
P1B(ok, L, 0), where L denotes a default undefined value). After Phase 1, N1 
picks as a candidate value the one accepted at the greatest round from those 
returned by the acceptors in the quorum, or its proposed value if all acceptors 
returned an undefined value. In our case, N1 picks its value vı. In Phase 2, 
N1 requests the acceptors to accept the candidate value vı at round 1 (messages 
P2A (v1, 1)). The message to N3 gets lost, and N1 and N2 accept value vı, which 
gets decided (messages P2B (ok) ). 

Now N3 goes through Phase 1 with round 3 (messages P1A(3)). Both N2 
and N3 switch to round 3. N2 answers N3 with its stored value vı and with the 
round 1 at which vı was accepted (message P1B(ok, vı, 1)), and N3 answers 
itself with an undefined value, as it has never accepted any value yet (message 
P1B(ok, L, 0)). This way, if some value has been already decided upon, any pro- 
poser that convinces a quorum to switch to its round would receive the decided 
value from some of the acceptors in the quorum (recall that two quorums have 
a non-empty intersection). That is, N3 picks the vı returned by N2 as the can- 
didate value, and in Phase 2 it manages that the quorum N2 and N3 accepts 
vı at round 3 (messages P2A(v,, 3) and P2B(ok)). N3 succeeds in making a 
new decision, but the decided value remains the same, and, therefore, the safety 
requirements of a consensus protocol are satisfied. 


3 The Faithful Deconstruction of SD-Paxos 


We now recall the faithfull deconstruction of SD-Paxos in [2,3], which we take 
as the reference architecture for the implementations that we aim to verify. We 
later show how each module of the deconstruction can be verified separately. 

The deconstruction is depicted on the left of Fig. 2, which consists of modules 
Pazos, Round-Based Consensus and Round-Based Register. These modules cor- 
respond to the ones in Fig. 4 of [2], with the exception of Weak Leader Election. 
We assume that a correct process that is trusted by every other correct process 
always exists, and omit the details of the leader election. Leaders take the role 
of proposers and invoke the interface of Paxos. Each module uses the interface 
provided by the module below it. 
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1 read(int k) { 18 write(int k, val vW) { 

2 int j; val v; int kW; val maxV; 19 int j; set of int Q; msg m; 
3 int maxKW; set of int Q; msg m; 20 for (j := 1, j <= n, j++) 
4 for (j := 1, j <= n, j++) 21 { send(j, [WR, k, vW]); } 
5 { send(j, [RE, k]); } 22 Q := {}; 

6 maxKW := 0; maxV := undef; Q := {}; 23 do { (j, m) := receive(); 
7 do { (j, m) := receive(); 24 switch (m) { 

8 switch (m) { 25 case [ackWR, @k]: 

9 case [ackRE, @k, v, kW]: 26 Q := Q U {j}; 

10 Q := QU {j}; 27 case [nackWR, @k]: 
11 if (kW >= maxKW) 28 return false; 

12 { maxKW := kW; maxV := v; } 29 } if (Q| = [(@t1)/2]) 
13 case [nackRE, @k]: 30 { return true; } } 
14 return (false, _); 31 while (true); } 

15 } if (Q| = [(m+1)/2]) 

16 { return (true, maxV); } } 


17 while (true); } 


Fig. 3. Implementation of Round-Based Register (read and write). 


The entry module Pazos implements SD-Paxos. Its specification (right of 
Fig. 2) keeps a variable vP that stores the decided value (initially undefined) and 
provides the operation proposeP that takes a proposed value vO and returns vP 
if some value was already decided, or otherwise it returns vO. The code of the 
operation runs atomically, which we emphasise via angle brackets (...). We define 
this specification so it meets the safety requirements of a consensus, therefore, 
any implementation whose entry point refines this specification will have to meet 
the same safety requirements. 

In this work we present both specifications and implementations in pseudo- 
code for an imperative WHILE-like language with basic arithmetic and primitive 
types, where val is some user-defined type for the values decided by Paxos, and 
undef is a literal that denotes an undefined value. The pseudo-code is self- 
explanatory and we restraint ourselves from giving formal semantics to it, which 
could be done in standard fashion if so wished [30]. At any rate, the pseudo-code 
is ultimately a vehicle for illustration and we stick to this informal presentation. 

The implementation of the modules is depicted in Figs.3, 4 and 5. We 
describe the modules following a bottom-up approach, which better fits the pur- 
pose of conveying the connection between the deconstruction and SD-Paxos. 
We start with module Round-Based Register, which offers operations read and 
write (Fig.3) and implements the replicated processes that adopt the role of 
acceptors (Fig. 4). We adapt the wait-free, crash-stop implementation of Round- 
Based Register in Fig.5 of [2] by adding loops for the explicit reception of each 
individual message and by counting acknowledgement messages one by one. Pro- 
cesses are identified by integers from 1 to n, where n is the number of processes 
in the system. Proposers and acceptors exchange read and write requests, and 
their corresponding acknowledgements and non/acknowledgements. We assume 
a type msg for messages and let the message vocabulary to be as follows. 
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1 process Acceptor(int j) { 

2 val v := undef; int r := 0; int w := 0; 

3 start() { 

4 int i; msg m; int k; 

5 do { (i, m) := receive(); 

6 switch (m) { 

7 case [RE, k]: 

8 if (k < r) { send(i, [nackRE, k]); } 

9 else { ( r := k; send(i, [ackRE, k, v, wl); ) } 
10 case [WR, k, vW]: 

11 if (k < r) { send(i, [mackWR, k]); } 

12 else { ( r := k; w := k; v := vW; send(i, [ackWR, k]); ) } 
13 FF 

14 while (true); } } 


Fig. 4. Implementation of Round-Based Register (acceptor). 


Read requests [RE, k] carry the proposer’s round k. Write requests [WR, k, v] 
carry the proposer’s round k and the proposed value v. Read acknowledge- 
ments [ackRE,k,v,k’] carry the proposer’s round k, the acceptor’s value 
v, and the round k’ at which v was accepted. Read non-acknowledgements 
[nackRE, k] carry the proposer’s round k, and so do carry write acknowledge- 
ments [ackWR, k] and write non/acknowledgements [nackWR, K]. 

In the pseudo-code, we use _ for a wildcard that could take any literal value. 
In the pattern-matching primitives, the literals specify the pattern against which 
an expression is being matched, and operator @ turns a variable into a literal 
with the variable’s value. Compare the case [ackRE, @k, v, kW]: in Fig. 3, where 
the value of k specifies the pattern and v and kW get some values assigned, with 
the case [RE, k]: in Fig. 4, where k gets some value assigned. 

We assume the network ensures that messages are neither created, modified, 
deleted, nor duplicated, and that they are always delivered but with an arbi- 
trarily large transmission delay.” Primitive send takes the destination j and the 
message m, and its effect is to send m from the current process to the process j. 
Primitive receive takes no arguments, and its effect is to receive at the cur- 
rent process a message m from origin i, after which it delivers the pair (i, m) of 
identifier and message. We assume that send is non-blocking and that receive 
blocks and suspends the process until a message is available, in which case the 
process awakens and resumes execution. 

Each acceptor (Fig.4) keeps a value v, a current round r (called the read 
round), and the round w at which the acceptor’s value was last accepted (called 
the write round). Initially, v is undef and both r and w are 0. 

Phase 1 of SD-Paxos is implemented by operation read on the left of Fig. 3. 
When a proposer issues a read, the operation requests each acceptor’s promise 
to only accept values at a round greater or equal than k by sending [RE, k] 


? We allow creation and duplication of [RE, k] messages in Sect.5, where we obtain 
Multi-Paxos from SD-Paxos by a series of transformations of the network semantics. 
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1 proposeRC(int k, val vO) { 1 proposeP(val v0) { 

2 bool res; val v; 2 int k; bool res; val v; 
3 (res, v) := read(k); 3 k := pid(); 

4 if (res) { 4 do { (res, v) := 

5 if (v = undef) { v := v0; } 5 proposeRC(k, v0); 
6 res := write(k, v); 6 k :=k+n; 

7 if (res) { return (true, v); } } 7 } while (!res); 

8 return (false, _); } 8 return v; } 


Fig. 5. Implementation of Round-Based Consensus (left) and Pazos (right) 


(lines 4-5). When an acceptor receives a [RE, k] (lines 5-7 of Fig. 4) it acknowl- 
edges the promise depending on its read round. If k is strictly less than r 
then the acceptor has already made a promise to another proposer with greater 
round and it sends [mackRE, k] back (line 8). Otherwise, the acceptor updates 
r to k and acknowledges by sending [ackRE, k, v, w] (line 9). When the pro- 
poser receives an acknowledgement (lines 8-10 of Fig.3) it counts acknowl- 
edgements up (line 10) and calculates the greatest write round at which the 
acceptors acknowledging so far accepted a value, and stores this value in maxV 
(lines 11-12). If a majority of acceptors acknowledged, the operation succeeds 
and returns (true, maxV) (lines 15-16). Otherwise, if the proposer received some 
[nackRE, k] the operation fails, returning (false, _) (lines 13-14). 

Phase 2 of SD-Paxos is implemented by operation write on the right of 
Fig. 3. After having collected promises from a majority of acceptors, the pro- 
poser picks the candidate value vW and issues a write. The operation requests 
each acceptor to accept the candidate value by sending [WR, k, vW] (lines 20- 
21). When an acceptor receives [WR, k, vw] (line 10 of Fig.4) it accepts the 
value depending on its read round. If k is strictly less than r, then the acceptor 
never promised to accept at such round and it sends [nackWR, k] back (line 11). 
Otherwise, the acceptor fullfils its promise and updates both w and r to k and 
assigns vW to its value v, and acknowledges by sending [ackWR, k] (line 12). 
Finally, when the proposer receives an acknowledgement (lines 23-25 of Fig. 3) 
it counts acknowledgements up (line 26) and checks whether a majority of accep- 
tors acknowledged, in which case vW is decided and the operation succeeds and 
returns true (lines 29-30). Otherwise, if the proposer received some [nackWR, k] 
the operation fails and returns false (lines 27—28).° 

Next, we describe module Round-Based Consensus on the left of Fig. 5. The 
module offers an operation proposeRC that takes a round k and a proposed 
value vO, and returns a pair (res, v) of Boolean and value, where res informs 
of the success of the operation and v is the decided value in case res is true. 
We have taken the implementation from Fig. 6 in [2] but adapted to our pseudo- 
code conventions. Round-Based Consensus carries out Phase 1 and Phase 2 of 


3 For the implementation to be correct with our shared-memory-concurrency app- 
roach, the update of the data in acceptors must happen atomically with the sending 
of acknowledgements in lines 9 and 12 of Fig. 4. 
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Fig. 6. Two histories in which a failing write contaminates some acceptor. 


SD-Paxos as explained in Sect. 2. The operation proposeRC calls read (line 3) 
and if it succeeds then chooses a candidate value between the proposed value 
vO or the value v returned by read (line 5). Then, the operation calls write 
with the candidate value and returns (true, v) if write succeeds, or fails and 
returns (false, _) (line 8) if either the read or the write fails. 

Finally, the entry module Pazos on the right of Fig. 5 offers an operation 
proposeP that takes a proposed value vO and returns the decided value. We 
assume that the system primitive pid() returns the process identifier of the 
current process. We have come up with this straightforward implementation of 
operation proposeP, which calls proposeRC with increasing round until the call 
succeeds, starting at a round equal to the process identifier pid() and increasing 
it by the number of processes n in each iteration. This guarantees that the round 
used in each invocation to proposeRC is unique. 


The Challenge of Verifying the Deconstruction of Paxos. Verifying 
each module of the deconstruction separately is cumbersome because of the 
distributed character of the algorithm and the nature of a linearisation proof. A 
process may not be aware of the information that will flow from itself to other 
processes, but this future information flow may dictate whether some operation 
has to be linearised at the present. Figure 6 illustrates this challenge. 

Let N1, N2 and N3 adopt both the roles of acceptors and proposers, which 
propose values v1, vg and vs with rounds 1, 2 and 3 respectively. Consider the 
history on the top of the figure. N2 issues a read with round 2 and gets acknowl- 
edgements from all but one acceptors in a quorum. (Let us call this one acceptor 
A.) None of these acceptors have accepted anything yet and they all return 
L as the last accepted value at round 0. In parallel, N3 issues a read with 
round 3 (third line in the figure) and gets acknowledgements from a quorum in 
which A does not occur. This read succeeds as well and returns (true, undef). 
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1 (bool x val) ptpli..n] := undef; 
2 val abs_vP := undef; single bool abs_resP[1..n] := undef; 

3 proposeP(val v0) { 

4 int k; bool res; val v; assume(!(v0 = undef)); 

5 k := pidO; BepIpiAOIE= Ceres) TO); 

6 do { ( (res, v) := proposeRC(k, v0); 

7 if (res) { 

8 for (i := 1, i <= n, i++) { 

9 if (ptp[i] = (true, v)) { lin(i); ptpli] := (false, v); } } 
10 if (!(v = v0)) { lin(pidQ); ptp[pidO] := (false, v0); } } ) 
11 k :=k +n; } 

12 while (!res); return v; } 


Fig. 7. Instrumented implementation of Pazos. 


Then N3 issues a write with round 3 and value v3. Again, it gets acknowledge- 
ments from a quorum in which A does not occur, and the write succeeds deciding 
value v3 and returns true. Later on, and in real time order with the write by 
N3 but in parallel with the read by N2, node N1 issues a write with round 1 
and value v; (first line in the figure). This write is to fail because the value v3 
was already decided with round 3. However, the write manages to “contami- 
nate” acceptor A with value vı, which now acknowledges N2 and sends vı as 
its last accepted value at round 1. Now N2 has gotten acknowledgements from 
a quorum, and since the other acceptors in the quorum returned 0 as the round 
of their last accepted value, the read will catch value vı accepted at round 1, 
and the operation succeeds and returns (true, v1). This history linearises by 
moving N2’s read after N1’s write, and by respecting the real time order for the 
rest of the operations. (The linearisation ought to respect the information flow 
order between N1 and N2 as well, i.e., N1 contaminates A with value v1, which 
is read by N2.) 

In the figure, a segment ending in an x indicates that the operation fails. The 
value returned by a successful read operation is depicted below the end of the 
segment. The linearisation points are depicted with a thick vertical line, and the 
dashed arrow indicates that two operations are in the information flow order. 

The variation of this scenario on the bottom of Fig. 6 is also possible, where 
N1’s write and N2’s read happen concurrently, but where N2’s read is shifted 
backwards to happen before in real time order with N3’s read and write. Since 
N1’s write happens before N2’s read in the information flow order, then N1’s 
write has to inexorably linearise before N3’s operations, which are the ones that 
will “steal” N1’s valid round. 

These examples give us three important hints for designing the specifications 
of the modules. First, after a decision is committed it is not enough to store only 
the decided value, since a posterior write may contaminate some acceptor with a 
value different from the decided one. Second, a read operation may succeed with 
some round even if by that time other operation has already succeeded with a 
higher round. And third, a write with a valid round may fail if its round will 
be “stolen” by a concurrent operation. The non-deterministic specifications that 
we introduce next allow one to model execution histories as the ones in Fig. 6. 
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4 Modularly Verifying SD-Paxos 


In this section, we provide non-deterministic specifications for Round-Based Con- 
sensus and Round-Based Register and show that each implementation refines its 
specification [9]. To do so, we instrument the implementations of all the modules 
with linearisation-point annotations and use Rely/Guarantee reasoning [26]. 
This time we follow a top-down order and start with the entry module Pazos. 


Module Pazos. In order to prove that the implementation on the right of 
Fig.5 refines its specification on the right of Fig.2, we introduce the instru- 
mented implementation in Fig. 7, which uses the helping mechanism for external 
linearisation points of [18]. We assume that each proposer invokes proposeP with 
a unique proposed value. The auxiliary pending thread pool ptp[n] is an array 
of pairs of Booleans and values of length n, where n is the number of processes 
in the system. A cell ptp[i] containing a pair (true, v) signals that the process 
i proposed value v and the invocation proposeP(v) by process į awaits to be 
linearised. Once this invocation is linearised, the cell ptp[i] is updated to the 
pair (false, v). A cell ptp[Li] containing undef signals that the process 7 never 
proposed any value yet. The array abs_resP[n] of Boolean single-assignment 
variables stores the abstract result of each proposer’s invocation. A linearisation- 
point annotation lin(z) takes a process identifier 7 and performs atomically the 
abstract operation invoked by proposer i and assigns its result to abs_resP [i]. 
The abstract state is modelled by variable abs_vP, which corresponds to variable 
vP in the specification on the right of Fig. 2. One invocation of proposeP may 
help linearise other invocations as follows. The linearisation point is together 
with the invocation to proposeRC (line 6). If proposeRC committed with some 
value v, the instrumented implementation traverses ptp and linearises all the 
proposers which were proposing value v (the proposer may linearise itself in this 
traversal) (lines 8-9). Then, the current proposer linearises itself if its proposed 
value vO is different from v (line 10), and the operation returns v (line 12). All 
the annotations and code in lines 6-10 are executed inside an atomic block, 
together with the invocation to proposeRC(k, vO). 


Theorem 1. The implementation of Pazos on the right of Fig. 5 linearises with 
respect to its specification on the right of Fig. 2. 


Module Round-Based Consensus. The top of Fig.8 shows the non- 
deterministic module’s specification. Global variable vRC is the decided value, 
initially undef. Global variable roundRC is the highest round at which some 
value was decided, initially 0; a global set of values valsRC (initially empty) 
contains values that may have been proposed by proposers. The specification is 
non-deterministic in that local value vD and Boolean b are unspecified, which we 
model by assigning random values to them. We assume that the current process 
identifier is ((k—1)modn) +1, which is consistent with how rounds are assigned 
to each process and incremented in the code of proposeP on the right of Fig. 5. 
If the unspecified value vD is neither in the set valsRC nor equal to vO then 
the operation returns (false, _) (line 11). This models that the operation fails 
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1 val vRC := undef; int roundRC := 0; set of val valsRC := {}; 

2 proposeRC(int k, val vO) { 

3 ( val vD := random(); bool b := random(); 

4 assume(!(vO = undef)); assume(pid() = ((k - 1) mod n) + 1); 
5 if (vD € (valsRC U {v0})) { 

6 valsRC := valsRC U {vD}; 

7 if (b && (k >= roundRC)) { roundRC := k; 

8 if (vRC = undef) { vRC := vD; } 
9 return (true, vRC); } 

10 else { return (false, _); } } 

11 else { return (false, _); } ) } 


proposeRC(int k, val vO) { 


1 
2 
3 
4 SiRGUS"|(BOON| X"|WaID|ABSLTESROUE=|uRUERY bool res; val v; 
5 assume(! (vO = undef)); assume(pid() = ((k - 1) mod n) + 1); 

6 { Ges, v) := read(k); if (res = false) { LimRo(undef, _); Fi) 
7 
8 
9 
0 
it 


if (res) { if (v = undef) { v := vO; } 


{ res := write(k, v); HENGER 


if (res) { return (true, v); } } 
return (false, _); } 


1 
1 


Fig. 8. Specification (top) and instrumented implementation (bottom) of Round-Based 
Consensus. 


without contaminating any acceptor. Otherwise, the operation may contaminate 
some acceptor and the value vD is added to the set valsRC (line 6). Now, if the 
unspecified Boolean b is false, then the operation returns (false, _) (lines 7 
and 10), which models that the round will be stolen by a posterior operation. 
Finally, the operation succeeds if k is greater or equal than roundRC (line 7), and 
roundRC and vRC are updated and the operation returns (true, vRC) (lines 7-9). 

In order to prove that the implementation in Fig.5 linearises with respect 
to the specification on the top of Fig. 8, we use the instrumented implementa- 
tion on the bottom of the same figure, where the abstract state is modelled by 
variables abs_vRC, abs_roundRC and abs_valsRC in lines 1-2, the local single- 
assignment variable abs_resRC stores the result of the abstract operation, and 
the linearisation-point annotations 1inRC(vD, b) take a value and a Boolean 
parameters and invoke the non-deterministic abstract operation and disam- 
biguate it by assigning the parameters to the unspecified vD and b of the specifi- 
cation. There are two linearisation points together with the invocations of read 
(line 6) and write (line 8). If read fails, then we linearise forcing the unspecified 
vD to be undef (line 6), which ensures that the abstract operation fails without 
adding any value to abs_valsRC nor updating the round abs_roundRC. Other- 
wise, if write succeeds with value v, then we linearise forcing the unspecified 
value vD and Boolean b to be v and true respectively (line 8). This ensures that 
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1 read(int k) { 16 val vRR := undef; 

2 ( val vD := random(); 17 int roundRR := 0; 

3 bool b := random(); val v; 18 set of val valsRR := {undef}; 
4 assume(vD € valsRR); 19 

5 assume(pid() = 20 write(int k, val vW) { 

6 ((k - 1) mod n) + 1); 21 ( bool b := random(); 

7 if (b) { 22 assume(!(vW = undef)); 

8 if (k >= roundRR) { 23 assume(pid() = 

9 roundRR := k; 24 ((k - 1) mod n) + 1); 

10 if (!(vRR = undef)) { 25 valsRR := valsRR U {vW}; 
11 v := vRR; } 26 if (b && (k >= roundRR)) { 
12 else { v := vD; } } 27 roundRR := k; 

13 else { v := vD; } 28 vRR := vW; 

14 return (true, v); } 29 return true; } 

15 else { return (false, _); } ) } 30 else { return false; } ) } 


Fig. 9. Specification of Round-Based Register. 


the abstract operation succeeds and updates the round abs_roundRC to k and 
assigns v to the decided value abs_vRC. If write fails then we linearise forcing 
the unspecified vD and b to be v and false respectively (line 9). This ensures 
that the abstract operation fails. 


Theorem 2. The implementation of Round-Based Consensus in Fig.5 lin- 
earises with respect to its specification on the top of Fig. 8. 


Module Round-Based Register. Figure9 shows the module’s non- 
deterministic specification. Global variable vRR represents the decided value, 
initially undef. Global variable roundRR represents the current round, initially 
0, and global set of values valsRR, initially containing undef, stores values that 
may have been proposed by some proposer. The specification is non-deterministic 
in that method read has unspecified local Boolean b and local value vD (we 
assume that vD is valsRR), and method write has unspecified local Boolean b. 
We assume the current process identifier is ((k — 1) mod n) + 1. 

Let us explain the specification of the read operation. The operation can 
succeed regardless of the proposer’s round k, depending on the value of the 
unspecified Boolean b. If b is true and the proposer’s round k is valid (line 8), 
then the read round is updated to k (line 9) and the operation returns (true, v) 
(line 14), where v is the read value, which coincides with the decided value if some 
decision was committed already or with vD otherwise. Now to the specification of 
operation write. The value vW is always added to the set valsRR (line 25). If the 
unspecified Boolean b is false (the round will be stolen by a posterior operation) 
or if the round k is non-valid, then the operation returns false (lines 26 and 
30). Otherwise, the current round is updated to k, and the decided value vRR is 
updated to vW and the operation returns true (lines 27-29). 

In order to prove that the implementation in Figs.3 and 4 linearises with 
respect to the specification in Fig. 9, we use the instrumented implementation in 
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Figs. 10 and 11, which uses prophecy variables [1,26] that “guess” whether the 
execution of the method will reach a particular program location or not. The 
instrumented implementation also uses external linearisation points. In partic- 
ular, the code of the acceptors may help to linearise some of the invocations to 
read and write, based on the prophecies and on auxiliary variables that count 
the number of acknowledgements sent by acceptors after each invocation of a 
read or awrite. The next paragraphs elaborate on our use of prophecy variables 
and on our helping mechanism. 

Variables abs_vRR, abs_roundRR and abs_valsRR in Fig.10 model the 
abstract state. They are initially set to undef, 0 and the set containing undef 
respectively. Variable abs_res_r[k] is an infinite array of single-assignment 
pairs of Boolean and value that model the abstract results of the invocations 
to read. (Think of an infinite array as a map from integers to some type; we 
use the array notation for convenience.) Similarly, variable abs_res_w[k] is an 
infinite array of single-assignment Booleans that models the abstract results of 
the invocations to write. All the cells in both arrays are initially undef (e.g. 
the initial maps are empty). Variables count_r [k] and count_w[k] are infinite 
arrays of integers that model the number of acknowledgements sent (but not 
necessarily received yet) from acceptors in response to respectively read or write 
requests. All cells in both arrays are initially 0. The variable proph_r[k] is an 
infinite array of single-assignment pairs bool x val, modelling the prophecy for 
the invocations of read, and variable proph_w[k] is an infinite array of single- 
assignment Booleans modelling the prophecy for the invocations of write. 

The linearisation-point annotations 1inRE(k, vD, b) for read take the pro- 
poser’s round k, a value vD and a Boolean b, and they invoke the abstract 
operation and disambiguate it by assigning the parameters to the unspecified vD 
and b of the specification on the left of Fig.9. At the beginning of a read(k) 
(lines 11-14 of Fig. 10), the prophecy proph_r[k] is set to (true, v) if the invo- 
cation reaches PL: RE_SUCC in line 26. The v is defined to coincide with maxV at 
the time when that location is reached. That is, v is the value accepted at the 
greatest round by the acceptors acknowledging so far, or undefined if no accep- 
tor ever accepted any value. If the operation reaches PL: RE_FAIL in line 24 
instead, the prophecy is set to (false, _). (If the method never returns, the 
prophecy is left undef since it will never linearise.) A successful read(k) lin- 
earises in the code of the acceptor in Fig. 11, when the [(n + 1)/2]th acceptor 
sends [ackRE, k, v, w], and only if the prophecy is (true, v) and the operation 
was not linearised before (lines 10-14). We force the unspecified vD and b to 
be v and true respectively, which ensures that the abstract operation succeeds 
and returns (true, v). A failing read(k) linearises at the return in the code 
of read (lines 23-24 of Fig.10), after the reception of [nackRE, k] from one 
acceptor. We force the unspecified vD and b to be undef and false respectively, 
which ensures that the abstract operation fails. 

The linearisation-point annotations linWR(k, vW, b) for write take the pro- 
poser’s round k and value vW, and a Boolean b, and they invoke the abstract 
operation and disambiguate it by assigning the parameter to the unspecified b 
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read(int k) { 
int j; val v; set of int Q; int maxKW; val maxV; msg m; 


for (j := 1, j <= n, j++) { send(j, [RE, k]); } 
maxKW := 0; maxV := undef; Q := {}; 
do { (j, m) := receive(); 


switch (m) { 
case [ackRE, @k, v, kW]: 
Q := Q U {j}; 
if (kW >= maxKW) { maxKW := kW; maxV := v; } 
case [nackRE, @k]: 
( linRE(k, undef, false); proph_r[k] := undef; 
return (false, _); ) // PL: RE_FAIL 
} if (Q| = [(mt1)/2]) < 
return (true, maxV); } } 7/7 PL: RE_SUCC 
while (true); } 
write(int k, val vW) { 


int j; set of int Q; msg m; 


do { (j, m) := receive(); 
switch (m) { 
case [ackWR, @k]: 
Q := Q U {j}; 
case [nackWR, @k]: 
( if (count_w[k] = 0) { 
linWR(k, vW, false); proph_w[k] := undef; } 
return false; 
} if (|Q| = [@+1)/2]) { 
return true; } } // PL: WRLSUCC 


while (true); } 


Fig. 10. Instrumented implementation of read and write methods. 
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process Acceptor(int j) { 


val v := undef; int r := 0; int w := 0; 
start() { 
int i; msg m; int k; 
do { (i, m) := receive(); 
switch (m) { 
case [RE, k]: 
if (k < r) { send(i, [mackRE, k]); } 
else { (r := K; 


send(i, [ackRE, k, v, w]); ) } 

case [WR, k, vW]: 
if (k < r) { send(j, i, [mackWR, k]); } 
else { (r := k; w := k; v := vW; 


NPP RPE RP RP Re eRe 
DOAN DOBRWNRFRFOWOAN DWAR UONE 


Own Ww 
Erwn ke 


send(j, i, [ackWR, k]); ) } 


N 
or 


} } 
while (true); } } 


N 
a 


Fig. 11. Instrumented implementation of acceptor processes. 


of the specification on the right of Fig.9. At the beginning of a write(k, vW) 
(lines 31-33 of Fig. 10), the prophecy proph_r [k] is set to true if the invocation 
reaches PL: WR_SUCC in line 45, or to false if it reaches PL: WR_FAIL in line 43 
(or it is left undef if the method never returns). A successfully write(k, vW) 
linearises in the code of the acceptor in Fig. 11, when the [(n+1)/2]th acceptor 
sends [ackWR, k], and only if the prophecy is true and the operation was not 
linearised before (lines 17-24). We force the unspecified b to be true, which 
ensures that the abstract operation succeeds deciding value vW and updates 
roundRR to k. A failing write(k, vW) may linearise either at the return in its 
own code (lines 41-43 of Fig. 10) if the proposer received one [nackWR, k] and no 
acceptor sent any [ackWR, k] yet, or at the code of the acceptor, when the first 
acceptor sends [ackWR, k], and only if the prophecy is false and the operation 
was not linearised before. In both cases, we force the unspecified b to be false, 
which ensures that the abstract operation fails. 


Theorem 3. The implementation of Round-Based Register in Figs. 10 and 11 
linearises with respect to its specification in Fig. 9. 
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5 Multi-Paxos via Network Transformations 


We now turn to more complicated distributed protocols that build upon the idea 
of Paxos consensus. Our ultimate goal is to reuse the verification result from the 
Sects. 3 and 4, as well as the high-level round-based register interface. In this 
section, we will demonstrate how to reason about an implementation of Multi- 
Paxos as of an array of independent instances of the Paxos module defined pre- 
viously, despite the subtle dependencies between its sub-components, as present 
in Multi-Paxos’s “canonical” implementations [5, 15,27]. While an abstraction of 
Multi-Paxos to an array of independent shared “single-shot” registers is almost 
folklore, what appears to be inherently difficult is to verify a Multi-Paxos-based 
consensus (wrt. to the array-based abstraction) by means of reusing the proof of 
a SD-Paxos. All proofs of Multi-Paxos we are aware of are, thus, non-modular 
with respect to underlying SD-Paxos instances [5,22,24], i.e., they require one 
to redesign the invariants of the entire consensus protocol. 

This proof modularity challenge stems from the optimised nature of a classical 
Multi-Paxos protocol, as well as its real-world implementations [6]. In this part 
of our work is to distil such protocol-aware optimisations into a separate network 
semantics layer, and show that each of them refines the semantics of a Cartesian 
product-based view, i.e., exhibits the very same client-observable behaviours. To 
do so, we will establishing the refinement between the optimised implementations 
of Multi-Paxos and a simple Cartesian product abstraction, which will allow to 
extend the register-based abstraction, explored before in this paper, to what is 
considered to be a canonical amortised Multi-Paxos implementation. 


5.1 Abstract Distributed Protocols 


We start by presenting the formal definitions of encoding distributed protocols 
(including Paxos), their message vocabularies, protocol-based network seman- 
tics, and the notion of an observable behaviours. 


Protocols and Messages. Figurel2 Protocols P>pê(4,M,S) 
provides basic definitions of the dis- Configurations X 3 ø £ Nodes — A 
tributed protocols and their compo- Internal steps Sint E Ax A 

nents. Each protocol p is a tuple Receive-steps Srv E AX Mx A 
(A, M, Sint; Srey; Ssna). A is a set of Send-steps Ssna E Ax Ax p(M) 
local states, which can be assigned to 
each of the participating nodes, also 
determining the node’s role via an addi- 
tional tag,“ if necessary (e.g., an acceptor and a proposer states in Paxos are 
different). M is a “message vocabulary”, determining the set of messages that 
can be used for communication between the nodes. 


Fig. 12. States and transitions. 


4 We leave out implicit the consistency laws for the state, that are protocol-specific. 
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STEPINT STEPSEND 
n € dom(c) = o(n) n € dom(o) 6 =o(n) (6,6’,ms) € p.Ssna 
(8,6) € p.Sint a =o[ne 8] d'= ojn = 6] M' =M Ums 
(o, M) = (o', M) (o, M) == (o’, M’) 
STEPRECEIVE 
meM m.active m.to E€ dom(c) = o (m.to) (8, m, 6) € p.Srey 
m’ = mlactive ++ False] a =a[ne 8] M' = M \ {m}U {m} 


lo, M) == (0', M’) 


rcv 


Fig. 13. Transition rules of the simple protocol-aware network semantics 


Messages can be thought of as JavaScript-like dictionaries, pairing unique 
fields (isomorphic to strings) with their values. For the sake of a uniform treat- 
ment, we assume that each message m € M has at least two fields, from and to 
that point to the source and the destination node of a message, correspondingly. 
In addition to that, for simplicity we will assume that each message carries a 
Boolean field active, which is set to True when the message is sent and is set to 
False when the message is received by its destination node. This flag is required 
to keep history information about messages sent in the past, which is customary 
in frameworks for reasoning about distributed protocols [10,23,28]. We assume 
that a “message soup” M is a multiset of messages (i.e. a set with zero or more 
copies of each message) and we consider that each copy of the same message in 
the multiset has its own “identity”, and we write m Æ m’ to represent that m 
and m’ are not the same copy of a particular message. 

Finally, Srint,rcv,sna} are step-relations that correspond to the internal 
changes in the local state of a node (Sint), as well as changes associated with 
sending (Sjnq) and receiving (Srev) messages by a node, as allowed by the pro- 
tocol. Specifically, Sin, relates a local node state before and after the allowed 
internal change; Srey relates the initial state and an incoming message m € M 
with the resulting state; Ssnq relates the internal state, the output state and the 
set of atomically sent messages. For simplicity we will assume that id C Sint. 

In addition, we consider Ag C A—the set of the allowed initial states, in 
which the system can be present at the very beginning of its execution. The 
global state of the network a € X is a map from node identifiers (n € Nodes) to 
local states from the set of states A, defined by the protocol. 


Simple Network Semantics. The simple initial operational semantics of the 
network (= C (2x e(M))x (2x e(M))) is parametrised by a protocol p and 
relates the initial configuration (t.e., the global state and the set of messages) 
with the resulting configuration. It is defined via as a reflexive closure of the 


. ; p 
union of three relations > U == U 
int rcv snd 


, their rules are given in Fig. 13. 


Paxos Consensus, Deconstructed and Abstracted 929 


The rule STEPINT corresponds to a node n picked non-deterministically from 
the domain of a global state o, executing an internal transition, thus chang- 
ing its local state from 6 to 6’. The rule STEPRECEIVE non-deterministically 
picks a m message from a message soup M C M, changes the state using the 
protocol’s receive-step relation p.Srev at the corresponding host node to, and 
updates its local state accordingly in the common mapping (a[to + 0’). Finally, 
the rule STEPSEND, non-deterministically picks a node n, executes a send-step, 
which results in updating its local state emission of a set of messages ms, which 
is added to the resulting soup. In order to “bootstrap” the execution, the initial 
states from the set Ap C A are assigned to the nodes. 

We next define the observable protocol behaviours wrt. the simple network 
semantics as the prefix-closed set of all system’s configuration traces. 


Definition 1. (Protocol behaviours) 


Jop eN € Ao, o = Wrenln y 60 | A 
loo, Mo) = ... 5 (om, Mm) 


B= | [on Bom Mn) 


That is, the set of behaviours captures all possible configurations of initial states 
for a fixed set of nodes N C Nodes. In this case, the set of nodes N is an implicit 
parameter of the definition, which we fix in the remainder of this section. 


Example 1 (Encoding SD-Paxos). An abstract distributed protocol for SD-Paxos 
can be extracted from the pseudo-code of Sect. 3 by providing a suitable small-step 
operational semantics à la Winskel [30]. We restraint ourselves from giving such 
formal semantics, but in Appendix D of the extended version of the paper we out- 
line how the distributed protocol would be obtained from the given operational 
semantics and from the code in Figs. 3, 4 and 5. 


5.2 Out-of-Thin-Air Semantics 


We now introduce an intermediate version of a simple protocol-aware semantics 
that generates messages “out of thin air” according to a certain predicate P C 
A x M, which determines whether the network generates a certain message 
without exercising the corresponding send-transition. The rule is as follows: 


OTASEND 
n € dom(c) ô = a(n) P(6,m) M'=MU{m} 


lo, M) 22, (o, M') 


That is, a random message m can be sent at any moment in the semantics 
described by Ż& u 25, given that the node n, “on behalf of which” the 
ota 


message is sent is in a state ô, such that P(ô, m) holds. 
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Example 2. In the context of Single-Decree Paxos, we can define P as follows: 
P(6,m) £ m.content = [RE, k] A ô.pid = n A d.role = Proposer ^ k < 6.kP 


In other words, if a node n is a Proposer currently operating with a round 
ô.kP, the network semantics can always send another request “on its behalf”, 
thus generating the message “out-of-thin-air”. Importantly, the last conjunct in 
the definition of P is in terms of <, rather than equality. This means that the 
predicate is intentionally loose, allowing for sending even “stale” messages, with 
expired rounds that are smaller than what n currently holds (no harm in that!). 


By definition of single-decree Paxos protocol, the following lemma holds: 
Lemma 1 (OTA refinement). 6 , _» p, C Bp, where p is an instance of 
SUS 


the module Paxos, as defined in Sect. 3 and in Example 1. 


5.3 Slot-Replicating Network Semantics 


With the basic definitions at hand, we now proceed to describing alternative net- 
work behaviours that make use of a specific protocol p = (A, M, Sint, Srev; Ssna); 
which we will consider to be fixed for the remainder of this section, so we will 
be at times referring to its components (e.g., Sint, Srev, etc.) without a qualifier. 


SRSTEPINT SRSTEPSEND 
ae. n € dom(c) ied n € dom(c) 
6 = o(n)[i] (8,8) € p.Sint 6 = o0(n)[i] (6,6’,ms) € p.Ssna 
a =o[n[i] = 6] a =a[n[i] = 6] M’ = M U msj[slot = i] 
(o, M) => (0’, M) (o, M) = (o', M’) 
SRSTEPRECEIVE 
mEM m.active m.to € dom(c) 6 = 0(m.to)[{m.slot] (6,m, 8’) € p.Srev 
m’ = m[active > False] o =a(n)[m.slot + 8] M' = M\{m}vU {m’'} 


(o, M) => (o', M’) 
Fig. 14. Transition rules of the slot-replicating network semantics. 


Figure 14 describes a semantics of a slot-replicating (SR) network that exer- 
cises multiple copies of the same protocol instance p; for i € I, some, possibly 
infinite, set of indices, to which we will be also referring as slots. Multiple copies 
of the protocol are incorporated by enhancing the messages from p’s vocabulary 
M with the corresponding indices, and implementing the on-site dispatch of the 
indexed messages to corresponding protocol instances at each node. The local 
protocol state of each node is, thus, no longer a single element being updated, 
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but rather an array, mapping i € I into 6;—the corresponding local state com- 
ponent. The small-step relation for SR semantics is denoted by =. The rule 


SRSTEPINT is similar to STEPINT of the simple semantics, with the difference 
that it picks not only a node but also an index i, thus referring to a specific 
component o(n)[i] as 6 and updating it correspondingly (a(n)[i] +> 6’). For the 
remaining transitions, we postulate that the messages from p’s vocabulary p.M 
are enhanced to have a dedicated field slot, which indicates a protocol copy at 
a node, to which the message is directed. The receive-rule SRSTEPRECEIVE is 
similar to STEPRECEIVE but takes into the account the value of m.slot in the 
received message m, thus redirecting it to the corresponding protocol instance 
and updating the local state appropriately. Finally, the rule SRSTEPSEND can 
be now executed for any slot i € I, reusing most of the logic of the initial protocol 
and otherwise mimicking its simple network semantic counterpart STEPSEND. 

Importantly, in this semantics, for two different slots i,j, such that i Æ j, 
the corresponding “projections” of the state behave independently from each 
other. Therefore, transitions and messages in the protocol instances indexed by 
i at different nodes do not interfere with those indexed by j. This observation 
can be stated formally. In order to do so we first defined the behaviours of 
slot-replicating networks and their projections as follows: 


Definition 2 (Slot-replicating protocol behaviours). 


JEN € Ao, 
Bx = |] 4 Uoo, Mo), , (am, Mm)) | 70 = Wneyln = {i= 8o | te TH] A 
meN (oo, Mo) s+ (om, Mm) 


That is, the slot-replicated behaviours are merely behaviours with respect to 
networks, whose nodes hold multiple instances of the same protocol, indexed by 
slots i € I. For a slot i € I, we define projection B,.|; as a set of global state 
traces, where each node’s local states is restricted only to its ith component. 
The following simulation lemma holds naturally, connecting the state-replicating 
network semantics and simple network semantics. 


Lemma 2 (Slot-replicating simulation). For all I,i € I, By|; = Bp. 


Example 3 (Slot-replicating semantics and Pazos). Given our representation of 
Paxos using roles (acceptors/proposers) encoded via the corresponding parts of 
the local state 6, we can construct a “naive” version of Multi-Paxos by using the 
SR semantics for the protocol. In such, every slot will correspond to a SD-Paxos 
instance, not interacting with any other slots. From the practical perspective, 
such an implementation is rather non-optimal, as it does not exploit dependen- 
cies between rounds accepted at different slots. 
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5.4 Widening Network Semantics 


We next consider a version of the SR semantics, extended with a new rule for 
handling received messages. In the new semantics, dubbed widening, a node, 
upon receiving a message m € T, where T C p.M, for a slot i, replicates it for 
all slots from the index set J, for the very same node. The new rule is as follows: 


WSTEPRECEIVET 
me M m.active m.to E€ dom(c) ð = o (m.to) [m.slot] 
(8, m, 6’) € p.Srev m’ = mlactive > False] a’ =a(n)[m.slot = ô'] 


ms = if (m € T) then {m | m’ = m[slot + j],7 € I} else Ø 
lo, M) => (o', (M \ {m}) U {m'} U ms) 


At first, this semantics seems rather unreasonable: it might create more messages 
than the system can “consume”. However, it is possible to prove that, under 
certain conditions on the protocol p, the set of behaviours observed under this 
semantics (i.e., with SRSTEPRECEIVE replaced by WSTEPRECEIVET) is not 
larger than B, as given by Definition 2. To state this formally we first relate the 
set of “triggering” messages T from WSTEPRECEIVET to a specific predicate P. 


Definition 3 (OTA-compliant message sets). The set of messages T C 
p.M is OTA-compliant with the predicate P iff for any b € Bp and (øo, M} € b, 
if m € M, then P(o(m.from), m). 


In other words, the protocol p is relaxed enough to “justify” the presence of m in 
the soup at any execution, by providing the predicate P, relating the message to 
the corresponding sender’s state. Next, we use this definition to slot-replicating 
and widening semantics via the following definition. 


Definition 4 (P-monotone protocols). A protocol p is P-monotone iff for 
any, b € Bx, (o, M} € b, m, i = m.slot, and j 4 i, if P(o(m.from)[i], 4m) then 
we have that P(o(m.from)[j], hm), where ym “removes” the slot field from m. 

Less formally, Definition 4 ensures that in a slot-replicated product x of a pro- 
tocol p, different components cannot perform “out of sync” wrt. P. Specifically, 
if a node in ith projection is related to a certain message hm via P, then any 
other projection j of the same node will be P-related to this message, as well. 


Example 4. This is a “non-example”. A version of slot-replicated SD-Paxos, 
where we allow for arbitrary increments of the round per-slot at a same pro- 
poser node (i.e., out of sync), would not be monotone wrt. P from Example 2. 
In contrast, a slot-replicated product of SD-Paxos instances with fixed rounds is 
monotone wrt. the same P. 


Lemma 3. If T from WSTEPRECEIVET is OTA-compliant with predicate P, 
such that B , »p. CB». and p is P-monotone, then B v CB x.. 
U => — 


ota 


Example 5 (Widening semantics and Paros). The SD-Paxos instance as 
described in Sect.3 satisfies the refinement condition from Lemma 3. By tak- 
ing T = {m | m = {content = [RE, k];...}} and using Lemma3, we obtain the 
refinement between widened semantics and SR semantics of Paxos. 
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5.5 Optimised Widening Semantics 


Our next step towards a realistic implementation of Multi-Paxos out of SD- 
Paxos instances is enabled by an observation that in the widening semantics, 
the replicated messages are always targeting the same node, to which the initial 
message m € T was addressed. This means that we can optimise the receive-step, 
making it possible to execute multiple receive-transitions of the core protocol in 
batch. The following rule OWSTEPRECEIVET captures this intuition formally: 


OWSTEPRECEIVET 
me M m.active m.to E€ dom(c) (o’,ms) = receiveAndAct(o, n, m) 


lo, M) => (o', M \ {m} U {mlactive = False]} U ms) 


where receiveAndAct(o,n,m) £ (o’,ms), such that ms = U; {m[slot = j] | m € ms;}, 
Vj € I, = o(m.to)[j] A (6;, hm, 6;) E p.Srev A (67,65) E p.Sint A (67,03, ms;) € p.Ssna, 
Vj € I,o'(m.to)[j] = ô. 

In essence, the rule OWSTEPRECEIVET blends several steps of the widening 
semantics together for a single message: (a) it first receives the message and 
replicates it for all slots at a destination node; (b) performs receive-steps for 
the message’s replicas at each slot; (c) takes a number of internal steps, allowed 
by the protocol’s Sint; and (d) takes a send-transition, eventually sending all 
emitted message, instrumented with the corresponding slots. 


Example 6. Continuing Example 5, with the same parameters, the optimising 
semantics will execute the transitions of an acceptor, for all slots, triggered by 
receiving a single [RE, k] message for a particular slot, sending back all the 
results for all the slots, which might either agree to accept the value or reject it. 


The following lemma relates the optimising and the widening semantics. 
Lemma 4 (Refinement for OW semantics). For any b € Ba there exists 
bve Bz, such that b can be obtained from b' by replacing sequences of configu- 


rations |(ox, Mk), .. ., (Ok+m, Mk+m)] that have just a single node n, whose local 
state is affected in Ok, ...,Okym; by (0k, Mk), (Oktm, Mk+m)l- 


That is, behaviours in the optimised semantics are the same as in the widening 
semantics, modulo some sequences of locally taken steps that are being “com- 
pressed” to just the initial and the final configurations. 


5.6 Bunching Semantics 


As the last step towards Multi-Paxos, we introduce the final network seman- 
tics that optimises executions according to X, described in previous section 


even further by making a simple addition to the message vocabulary of a slot- 
replicated SD-Paxos—bunched messages. A bunched message simply packages 
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BSTEPRECVB 
mé€M m.active m.to E€ dom(c) 
(o’, ms) = receiveAndAct(o, n, m) BSTEPRECVU 
M’ = M \ {m} U {ml[active + False]} mEM m.active m.to € dom(o) 
m = bunch(ms, m.to, m.from) m.msgs =ms M’ = M \{m}U ms 
(o, M) => (o', M' U {m'}) lo, M) = lo, M’) 


where bunch(ms, n1, n2) = {msgs = ms; from = nı; to = nz; active = True}. 


Fig. 15. Added rules of the Bunching Semantics 


together several messages, obtained typically as a result of a “compressed” exe- 
cution via the optimised semantics from Sect. 5.5. We define two new rules for 
packaging and “unpackaging” certain messages in Fig. 15. The two new rules 
can be added to enhance either of the versions of the slot-replicating semantics 
shown before. In essence, the only effect they have is to combine the messages 
resulting in the execution of the corresponding steps of an optimised widen- 
ing (via BSTEPRECVB), and to unpackage the messages ms from a bunching 
message, adding them back to the soup (BSTEPRECVU). The following natural 
refinement result holds: 


Lemma 5. ForanybE B » there exists b' € B y+ , such that b' can be obtained 
== => 
from b by replacing all bunched messages in b by their msgs-component. 


The rule BSTEPRECVU enables effective local caching of the bunched messages, 
so they are processed on demand on the recipient side (i.e., by the per-slot 
proposers), allowing the implementation to skip an entire round of Phase 1. 


(5) ==) via Lm 1 refines (5) 
via Lm 5 refines sim. via Lm 2 sim. via Lm 2 
(SS) via Lm 4 refines (5) via Lm 3 refines (5) 


Fig. 16. Refinement between different network semantics. 


proposeM(val^ v, val v0) { 5 val vM[1..oo] := undef; 
( assume(!(v0 = undef)); 6 getR(int s) { return &(vM[s]); } 
if (*v = undef) { *v := vO; } 7 proposeM(getR(1), v); 
return *v; ) } 8 proposeM(getR(2), v); 


Ae WwW Ne 


Fig. 17. Specification of Multi-Pazxos and interaction via a register provider. 
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5.7 The Big Picture 


What exactly have we achieved by introducing the described above family of 
semantics? As illustrated in Fig.16, all behaviours of the leftmost-topmost, 
bunching semantics, which corresponds precisely to an implementation of Multi- 
Paxos with an “amortised” Phase 1, can be transitively related to the corre- 
sponding behaviours in the rightmost, vanilla slot-replicated version of a simple 
semantics (via the correspondence from Lemma 1) by constructing the corre- 
sponding refinement mappings [1], delivered by the proofs of Lemmas 3-5. 

From the perspective of Rely/Guarantee reasoning, which was employed in 
Sect. 4, the refinement result from Fig. 16 justifies the replacement of a semantics 
on the right of the diagram by one to the left of it, as all program-level assertions 
will remain substantiated by the corresponding system configurations, as long 
as they are stable (i.e., resilient wrt. transitions taken by nodes different from 
the one being verified), which they are in our case. 


6 Putting It All Together 


We culminate our story of faithfully deconstructing and abstracting Paxos via 
a round-based register, as well as recasting Multi-Paxos via a series of network 
transformations, by showing how to implement the register-based abstraction 
from Sect.3 in tandem with the network semantics from Sect.5 in order to 
deliver provably correct, yet efficient, implementation of Multi-Paxos. 

The crux of the composition of the two results—a register-based abstraction 
of SD-Paxos and a family of semantics-preserving network transformations—is 
a convenient interface for the end client, so she could interact with a consensus 
instance via the proposeM method in lines 1-4 of Fig. 17, no matter with which 
particular slot of a Multi-Paxos implementation she is interacting. To do so, 
we propose to introduce a register provider—a service that would give a client a 
“reference” to the consensus object to interact with. Lines 6-7 of Fig. 17 illustrate 
the interaction with the service provider, where the client requests two specific 
slots, 1 and 2, of Multi-Paxos by invoking getR and providing a slot parameter. 
In both cases the client proposes the very same value v in the two instances that 
run the same machinery. (Notice that, except for the reference to the consensus 
object, proposeM is identical to the proposeP on the right of Fig.2, which we 
have verified wrt. linearisability in Sect. 3.) 

The implementation of Multi-Paxos that we have in mind resembles the one 
in Figs.3, 4 and 5 of Sect. 3, but where all the global data is provided by the 
register provider and passed by reference. What differs in this implementation 
with respect to the one in Sect. 3 and is hidden from the client is the semantics of 
the network layer used by the bottom layer (cf. left part of Fig. 2) of the register- 
based implementation. The Multi-Paxos instances run (without changing the 
register’s code) over this network layer, which “overloads” the meaning of the 
send/receive primitives from Figs.3 and 4 to follow the bunching network 
semantics, described in Sect. 5.6. 
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Theorem 4. The implementation of Multi-Paxos that uses a register provider 
and bunching network semantics refines the specification in Fig. 17. 


We implemented the register/network semantics in a proof-of-concept pro- 
totype written in Scala/Akka.° We relied on the abstraction mechanisms of 
Scala, allowing us to implement the register logic, verified in Sect. 4, separately 
from the network middle-ware, which has provided a family of Semantics from 
Sect.5. Together, they provide a family of provably correct, modularly verified 
distributed implementations, coming with a simple shared memory-like interface. 


7 Related Work 


Proofs of Linearisability via Rely/Guarantee. Our work builds on the 
results of Boichat et al. [3], who were first to propose to a systematic deconstruc- 
tion of Paxos into read/write operations of a round-based register abstraction. 
We extend and harness those abstractions, by intentionally introducing more 
non-determinism into them, which allows us to provide the first modular (i.e., 
mutually independent) proofs of Proposer and Acceptor using Rely /Guarantee 
with linearisation points and prophecies. While several logics have been proposed 
recently to prove linearisability of concurrent implementations using Rely /Guar- 
antee reasoning [14,18,19,26], none of them considers message-passing dis- 
tributed systems or consensus protocols. 


Verification of Paxos-Family Algorithms. Formal verification of different 
versions of Paxos-family protocols wrt. inductive invariants and liveness has been 
a focus of multiple verification efforts in the past fifteen years. To name just a 
few, Lamport has specified and verified Fast Paxos [17] using TLA+ and its 
accompanying model checker [32]. Chand et al. used TLA+ to specify and verify 
Multi-Paxos implementation, similar to the one we considered in this work [5]. 
A version of SD-Paxos has been verified by Kellomaki using the PVS theorem 
prover [13]. Jaskelioff and Merz have verified Disk Paxos in Isabelle/HOL [12]. 
More recently, Rahli et al. formalised an executable version of Multi-Paxos in 
EventML [24], a dialect of NuPRL. Dragoi et al. [8] implemented and verified 
SD-Paxos in the PSYNC framework, which implements a partially synchronised 
model [7], supporting automated proofs of system invariants. Padon et al. have 
proved the system invariants and the consensus property of both simple Paxos 
and Multi-Paxos using the verification tool Ivy [22,23]. 

Unlike all those verification efforts that consider (Multi-/Disk/Fast/...)Paxos 
as a single monolithic protocol, our approach provides the first modular verifica- 
tion of single-decree Paxos using Rely /Guarantee framework, as well as the first 
verification of Multi-Paxos that directly reuses the proof of SD-Paxos. 


5 The code is available at https://github.com/certichain/protocol-combinators. 
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Compositional Reasoning about Distributed Systems. Several recent 
works have partially addressed modular formal verification of distributed sys- 
tems. The IronFleet framework by Hawblitzel et al. has been used to verify both 
safety and liveness of a real-world implementation of a Paxos-based replicated 
state machine library and a lease-based shared key-value store [10]. While the 
proof is structured in a modular way by composing specifications in a way similar 
to our decomposition in Sects. 3 and 4, that work does not address the linearis- 
ability and does not provide composition of proofs about complex protocols (e.g., 
Multi-Paxos) from proofs about its subparts 

The Verdi framework for deductive verification of distributed systems [29,31] 
suggests the idea of Verified System Transformers (VSTs), as a way to provide 
vertical composition of distributed system implementation. While Verdi’s VSTs 
are similar in its purpose and idea to our network transformations, they do not 
exploit the properties of the protocol, which was crucial for us to verify Multi- 
Paxos’s implementation. 

The DISEL framework [25,28] addresses the problem of horizontal composition 
of distributed protocols and their client applications. While we do not compose 
Paxos with any clients in this work, we believe its register-based specification 
could be directly employed for verifying applications that use Paxos as its sub- 
component, which is what is demonstrated by our prototype implementation. 


8 Conclusion and Future Work 


We have proposed and explored two complementary mechanisms for modu- 
lar verification of Paxos-family consensus protocols [15]: (a) non-deterministic 
register-based specifications in the style of Boichat et al. [3], which allow one to 
decompose the proof of protocol’s linearisability into separate independent “lay- 
ers”, and (b) a family of protocol-aware transformations of network semantics, 
making it possible to reuse the verification efforts. We believe that the applica- 
bility of these mechanisms spreads beyond reasoning about Paxos and its vari- 
ants and that they can be used for verifying other consensus protocols, such as 
Raft [21] and PBFT [4]. We are also going to employ network transformations to 
verify implementations of Mencius [20], and accommodate more protocol-specific 
optimisations, such as implementation of master leases and epoch numbering [6]. 


Acknowledgements. We thank the ESOP 2018 reviewers for their feedback. This 
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Abstract. Parallel snapshot isolation (PSI) is a standard transactional 
consistency model used in databases and distributed systems. We argue 
that PSI is also a useful formal model for software transactional mem- 
ory (STM) as it has certain advantages over other consistency models. 
However, the formal PSI definition is given declaratively by acyclicity 
axioms, which most programmers find hard to understand and reason 
about. 

To address this, we develop a simple lock-based reference implemen- 
tation for PSI built on top of the release-acquire memory model, a well- 
behaved subset of the C/C++11 memory model. We prove that our 
implementation is sound and complete against its higher-level declara- 
tive specification. 

We further consider an extension of PSI allowing transactional and 
non-transactional code to interact, and provide a sound and complete 
reference implementation for the more general setting. Supporting this 
interaction is necessary for adopting a transactional model in program- 
ming languages. 


1 Introduction 


Following the widespread use of transactions in databases, software transactional 
memory (STM) [19,35] has been proposed as a programming language abstrac- 
tion that can radically simplify the task of writing correct and efficient concurrent 
programs. It provides the illusion of blocks of code, called transactions, executing 
atomically and in isolation from any other such concurrent blocks. 

In theory, STM is great for programmers as it allows them to concentrate 
on the high-level algorithmic steps of solving a problem and relieves them of 
such concerns as the low-level details of enforcing mutual exclusion. In practice, 
however, the situation is far from ideal as the semantics of transactions in the 
context of non-transactional code is not at all settled. Recent years have seen 
a plethora of different STM implementations [1-3,6,17,20], each providing a 
slightly different—and often unspecified—semantics to the programmer. 

Simple models in the literature are lock-based, such as global lock atomicity 
(GLA) [28] (where a transaction must acquire a global lock prior to execution and 
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release it afterwards) and disjoint lock atomicity (DLA) [28] (where a transaction 
must acquire all locks associated with the locations it accesses prior to execution 
and release them afterwards), which provide serialisable transactions. That is, 
all transactions appear to have executed atomically one after another in some 
total order. The problem with these models is largely their implementation cost, 
as they impose too much synchronisation between transactions. 

The database community has long recognised this performance problem and 
has developed weaker transactional models that do not guarantee serialisability. 
The most widely used such model is snapshot isolation (SI) [10], implemented 
by major databases, both centralised (e.g. Oracle and MS SQL Server) and 
distributed [16,30,33], as well as in STM [1,11,25,26]. In this article, we focus 
on a closely related model, parallel snapshot isolation (PSI) [36], which is known 
to provide better scalability and availability in large-scale geo-replicated systems. 
SI and PSI allow conflicting transactions to execute concurrently and to commit 
successfully, so long as they do not have a write-write conflict. This in effect 
allows reads of SI/PSI transactions to read from an earlier memory snapshot 
than the one affected by their writes, and permits outcomes such as the following: 


Initially, x = y = 0 
y:= l; (SB+txs) 
b := x; reads 0 


gi 
Tl; a := y; / reads 0 


=| 


The above is also known as the write skew anomaly in the database 
literature [14]. Such outcomes are analogous to those allowed by weak memory 
models, such as x86-TSO [29,34] and C11 [9], for non-transactional programs. 
In this article, we consider—to the best of our knowledge for the first time—PSI 
as a possible model for STM, especially in the context of a concurrent language 
such as C/C++ with a weak memory model. In such contexts, programmers are 
already familiar with weak behaviours such as that exhibited by SB+txs above. 

A key reason why PSI is more suitable for a programming language than 
SI (or other stronger models) is performance. This is analogous to why C/C++ 
adopted non-multi-copy-atomicity (allowing two different threads to observe a 
write by a third thread at different times) as part of their concurrency model. 
Consider the following “IRIW” (independent reads of independent writes) litmus 
test: 


Initially, x = y = 0 
T2: T3: 
ce 1 a := 2; “reads 0 c := y; reads 0 n RW te) 


ey d:= x; /reads 0 [y := 1; 


In the annotated behaviour, transactions T2 and T3 disagree on the relative 
order of transactions T1 and T4. Under PSI, this behaviour (called the long fork 
anomaly) is allowed, as T1 and T4 are not ordered—they commit in parallel — 
but it is disallowed under SI. This intuitively means that SI must impose ordering 
guarantees even on transactions that do not access a common location, and can 
be rather costly in the context of a weakly consistent system. 


942 A. Raad et al. 


A second reason why PSI is much more suitable than SI is that it has better 
properties. A key intuitive property a programmer might expect of transactions 
is monotonicity. Suppose, in the (SB+txs) program we split the two transactions 
into four smaller ones as follows: 


Initially, x = y = 0 
T2: [y := 1; (SB+txs+chop) 
T4: [b := £; / reads 0 


T1: |z = 
T3: [a := y; /reads 0 


One might expect that if the annotated behaviour is allowed in (SB+txs), it 
should also be allowed in (SB+txs+chop). This indeed is the case for PSI, but 
not for SI! In fact, in the extreme case where every transaction contains a single 
access, SI provides serialisability. Nevertheless, PSI currently has two significant 
drawbacks, preventing its widespread adoption. We aim to address these here. 

The first PSI drawback is that its formal semantics can be rather daunting 
for the uninitiated as it is defined declaratively in terms of acyclicity constraints. 
What is missing is perhaps a simple lock-based reference implementation of PSI, 
similar to the lock-based implementations of GLA and DLA, that the program- 
mers can readily understand and reason about. As an added benefit, such an 
implementation can be viewed as an operational model, forming the basis for 
developing program logics for reasoning about PSI programs. 

Although Cerone et al. [15] proved their declarative PSI specification equiva- 
lent to an implementation strategy of PSI in a distributed system with replicated 
storage over causal consistency, their implementation is not suitable for reasoning 
about shared-memory programs. In particular, it cannot help the programmers 
determine how transactional and non-transactional accesses may interact. 

As our first contribution, in Sect. 4 we address this PSI drawback by providing 
a simple lock-based reference implementation that we prove equivalent to its 
declarative specification. Typically, one proves that an implementation is sound 
with respect to a declarative specification—i.e. every behaviour observable in the 
implementation is accounted for in the declarative specification. Here, we also 
want the other direction, known as completeness, namely that every behaviour 
allowed by the specification is actually possible in the implementation. Having 
a (simple) complete implementation is very useful for programmers, as it may 
be easier to understand and experiment with than the declarative specification. 

Our reference implementation is built in the release-acquire fragment of the 
C/C++ memory model [8,9,21], using sequence locks [13,18, 23,32] to achieve 
the correct transactional semantics. 

The second PSI drawback is that its study so far has not accounted for the 
subtle effects of non-transactional accesses and how they interact with trans- 
actional accesses. While this scenario does not arise in ‘closed world’ systems 
such as databases, it is crucially important in languages such as C/C++ and 
Java, where one cannot afford the implementation cost of making every access 
transactional so that it is “strongly isolated” from other concurrent transactions. 

Therefore, as our second contribution, in Sect.5 we extend our basic refer- 
ence implementation to make it robust under uninstrumented non-transactional 
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accesses, and characterise declaratively the semantics we obtain. We call this 
extended model RPSI (for “robust PSI”) and show that it gives reason- 
able semantics even under scenarios where transactional and non-transactional 
accesses are mixed. 


Outline. The remainder of this article is organised as follows. In Sect.2 we 
present an overview of our contributions and the necessary background informa- 
tion. In Sect.3 we provide the formal model of the C11 release/acquire fragment 
and describe how we extend it to specify the behaviour of STM programs. In 
Sect. 4 we present our PSI reference implementation (without non-transactional 
accesses), demonstrating its soundness and completeness against the declarative 
PSI specification. In Sect. 5 we formulate a declarative specification for RPSI as 
an extension of PSI accounting for non-transactional accesses. We then present 
our RPSI reference implementation, demonstrating its soundness and complete- 
ness against our proposed declarative specification. We conclude and discuss 
future work in Sect. 6. 


2 Background and Main Ideas 


One of the main differences between the specification of database transactions 
and those of STM is that STM specifications must additionally account for 
the interactions between mixed-mode (both transactional and non-transactional) 
accesses to the same locations. To characterise such interactions, Blundell 
et al. [12,27] proposed the notions of weak and strong atomicity, often referred to 
as weak and strong isolation. Weak isolation guarantees isolation only amongst 
transactions: the intermediate state of a transaction cannot affect or be affected 
by other transactions, but no such isolation is guaranteed with respect to non- 
transactional code (e.g. the accesses of a transaction may be interleaved by those 
of non-transactional code.). By contrast, strong isolation additionally guarantees 
full isolation from non-transactional code. Informally, each non-transactional 
access is considered as a transaction with a single access. In what follows, we 
explore the design choices for implementing STMs under each isolation model 
(Sect. 2.1), provide an intuitive account of the PSI model (Sect. 2.2), and describe 
the key requirements for implementing PSI and how we meet them (Sect. 2.3). 


2.1 Implementing Software Transactional Memory 


Implementing STMs under either strong or weak isolation models comes with a 
number of challenges. Implementing strongly isolated STMs requires a conflict 
detection/avoidance mechanism between transactional and non-transactional 
code. That is, unless non-transactional accesses are instrumented to adhere to 
the same access policies, conflicts involving non-transactional code cannot be 
detected. For instance, in order to guarantee strong isolation under the GLA 
model [28] discussed earlier, non-transactional code must be modified to acquire 
the global lock prior to each shared access and release it afterwards. 
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Implementing weakly-isolated STMs requires a careful handling of aborting 
transactions as their intermediate state may be observed by non-transactional 
code. Ideally, the STM implementation must ensure that the intermediate state 
of aborting transactions is not leaked to non-transactional code. A transaction 
may abort either because it failed to commit (e.g. due to a conflict), or because 
it encountered an explicit abort instruction in the transactional code. In the 
former case, leaks to non-transactional code can be avoided by pessimistic con- 
currency control (e.g. locks), pre-empting conflicts. In the latter case, leaks can 
be prevented either by lazy version management (where transactional updates 
are stored locally and propagated to memory only upon committing), or by disal- 
lowing explicit abort instructions altogether — an approach taken by the (weakly 
isolated) relaxed transactions of the C++ memory model [6]. 

As mentioned earlier, our aim in this work is to build an STM with PSI 
guarantees in the RA fragment of C11. As such, instrumenting non-transactional 
accesses is not feasible and thus our STM guarantees weak isolation. For sim- 
plicity, throughout our development we make a few simplifying assumptions: (i) 
transactions are not nested; (ii) the transactional code is without explicit abort 
instructions (as with the weakly-isolated transactions of C++ [6]); and (iii) the 
locations accessed by a transaction can be statically determined. For the latter, 
of course, a static over-approximation of the locations accessed suffices for the 
soundness of our implementations. 


2.2 Parallel Snapshot Isolation (PSI) 


The initial model of PSI introduced in [36] is described informally in terms of 
a multi-version concurrent algorithm as follows. A transaction T at a replica r 
proceeds by taking an initial snapshot S of the shared objects in r. The execution 
of T is then carried out locally: read operations query S and write operations 
similarly update S. Once the execution of T is completed, it attempts to commit 
its changes to r and it succeeds only if it is not write-conflicted. Transaction T is 
write-conflicted if another committed transaction T’ has written to a location in 
r also written to by T, since it recorded its snapshot S. If T fails the conflict check 
it aborts and may restart the transaction; otherwise, it commits its changes to 
r, at which point its changes become visible to all other transactions that take a 
snapshot of replica r thereafter. These committed changes are later propagated 
to other replicas asynchronously. 

The main difference between SI and PSI is in the way the committed changes 
at a replica r are propagated to other sites in the system. Under the SI model, 
committed transactions are globally ordered and the changes at each replica 
are propagated to others in this global order. This ensures that all concurrent 
transactions are observed in the same order by all replicas. By contrast, PSI 
does not enforce a global order on committed transactions: transactional effects 
are propagated between replicas in causal order. This ensures that, if replica rı 
commits a message m which is later read at replica r2, and r2 posts a response 
m’, no replica can see m’ without having seen the original message m. However, 
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causal propagation allows two replicas to observe concurrent events as if occur- 
ring in different orders: if rı and r2 concurrently commit messages m and m’, 
then replica rą may initially see m but not m’, and r4 may see m’ but not m. 
This is best illustrated by the (IRIW-+txs) example in Sect. 1. 


2.3 Towards a Lock-Based Reference Implementation for PSI 


While the description of PSI above is suitable for understanding PSI, it is not 
very useful for integrating the PSI model in languages such as C, C++ or Java. 
From a programmer’s perspective, in such languages the various threads directly 
access the shared memory; they do not access their own replicas, which are 
loosely related to the replicas of other threads. What we would therefore like 
is an equivalent description of PSI in terms of unreplicated accesses to shared 
memory and a synchronisation mechanism such as locks. 

In effect, we want a definition similar in spirit to global lock atomicity 
(GLA) [28], which is arguably the simplest TM model, and models commit- 
ted transactions as acquiring a global mutual exclusion lock, then accessing and 
updating the data in place, and finally releasing the global lock. Naturally, how- 
ever, the implementation of PSI cannot be that simple. 

A first observation is that PSI cannot be simply implemented over sequen- 
tially consistent (SC) shared memory.! To see this, consider the IRIW-+txs pro- 
gram from the introduction. Although PSI allows the annotated behaviour, SC 
forbids it for the corresponding program without transactions. The point is that 
under SC, either the x := 1 or the y := 1 write first reaches memory. Suppose, 
without loss of generality, that x := 1 is written to memory before y := 1. Then, 
the possible atomic snapshots of memory are x = y = 0, x = 1 Ay = 0, and 
x = y = 1. In particular, the snapshot read by T3 is impossible. 

To implement PSI we therefore resort to a weaker memory model. Among 
weak memory models, the “multi-copy-atomic” ones, such as x86-TSO [29,34], 
SPARC PSO [87,38] and ARMv8-Flat [31], also forbid the weak outcome of 
(IRIW+txs) in the same way as SC, and so are unsuitable for our purpose. 
We thus consider release-acquire consistency (RA) [8,9,21], a simple and well- 
behaved non-multi-copy-atomic model. It is readily available as a subset of the 
C/C++11 memory model [9] with verified compilation schemes to all major 
architectures. 

RA provides a crucial property that is relied upon in the earlier description 
of PSI, namely causality. In terms of RA, this means that if thread A observes 
a write w of thread B, then it also observes all the previous writes of thread B 
as well as any other writes B observed before performing w. 

A second observation is that using a single lock to enforce mutual exclusion 
does not work as we need to allow transactions that access disjoint sets of loca- 
tions to complete in parallel. An obvious solution is to use multiple locks—one 


1 Sequential consistency (SC) [24] is the standard model for shared memory concur- 
rency and defines the behaviours of a multi-threaded program as those arising by 
executing sequentially some interleaving of the accesses of its constituent threads. 
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per location—as in the disjoint lock atomicity (DLA) model [28]. The ques- 
tion remaining is how to implement taking a snapshot at the beginning of a 
transaction. 

A naive attempt is to use reader/writer locks, which allow multiple readers 
(taking the snapshots) to run in parallel, as long as no writer has acquired the 
lock. In more detail, the idea is to acquire reader locks for all locations read 
by a transaction, read the locations and store their values locally, and then 
release the reader locks. However, as we describe shortly, this approach does not 
work. Consider the (IRIW+txs) example in Sect. 1. For T2 to get the annotated 
outcome, it must release its reader lock for y before T4 acquires it. Likewise, 
since T3 observes y = 1, it must acquire its reader lock for y after T4 releases 
it. By this point, however, it is transitively after the release of the y lock by T2, 
and so, because of causality, it must have observed all the writes observed by T2 
by that point—namely, the x := 1 write. In essence, the problem is that reader- 
writer locks over-synchronise. When two threads acquire the same reader lock, 
they synchronise, whereas two read-only transactions should never synchronise 
in PSI. 

To resolve this problem, we use sequence locks [{13,18,23,32]. Under the 
sequence locking protocol, each location x is associated with a sequence (ver- 
sion) number vx, initialised to zero. Each write to x increments vx before 
and after its update, provided that vx is even upon the first increment. 
Each read from x checks vx before and after reading x. If both values are 
the same and even, then there cannot have been any concurrent increments, 
and the reader must have seen a consistent value. That is, read(x) £ 
do{v:=vx; s:=x} while(is-odd(v) || vx!=v). Under SC, sequence locks are 
equivalent to reader-writer locks; however, under RA, they are weaker exactly 
because readers do not synchronise. 


Handling Non-transactional Accesses. Let us consider what happens if 
some of the data accessed by a transaction is modified concurrently by an 
atomic non-transactional write. Since non-transactional accesses do not acquire 
any locks, the snapshots taken can include values written by non-transactional 
accesses. The result of the snapshot then depends on the order in which the 
variables are read. Consider for example the following litmus test: 


x := l; | T: f := y; reads 1 
:= ]; ` |b := x; reads 0 
In our implementation, if the transaction’s snapshot reads y before x, then the 
annotated weak behaviour is not possible, because the underlying model (RA) 
disallows the weak “message passing” behaviour. If, however, x is read before 
y by the snapshot, then the weak behaviour is possible. In essence, this means 
that the PSI implementation described so far is of little use, when there are races 
between transactional and non-transactional code. 

Another problem is the lack of monotonicity. A programmer might expect 
that wrapping some code in a transaction block will never yield additional 
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behaviours not possible in the program without transactions. Yet, in this exam- 
ple, removing the T block and unwrapping its code gets rid of the annotated 
weak behaviour! 

To get monotonicity, it seems that snapshots must read the variables in the 
same order they are accessed by the transactions. How can this be achieved 
for transactions that say read x, then y, and then x again? Or transactions 
that depending on some complex condition, access first x and then y or vice 
versa? The key to solving this conundrum is surprisingly simple: read each vari- 
able twice. In more detail, one takes two snapshots of the locations read by the 
transaction, and checks that both snapshots return the same values for each 
location. This ensures that every location is read both before and after every 
other location in the transaction, and hence all the high-level happens-before 
orderings in executions of the transactional program are also respected by its 
implementation. 

There is however one caveat: since equality of values is used to determine 
whether the two snapshots are the same, we will miss cases where different 
non-transactional writes to a variable write the same value. In our formal devel- 
opment (see Sect.5), we thus assume that if multiple non-transactional writes 
write the same value to the same location, they cannot race with the same trans- 
action. This assumption is necessary for the soundness of our implementation 
and cannot be lifted without instrumenting non-transactional accesses. 


3 The Release-Acquire Memory Model for STM 


We present the notational conventions used in the remainder of this article and 
proceed with the declarative model of the release-acquire (RA) fragment [21] 
of the C11 memory model [9], in which we implement our STM. In Sect. 3.1 
we describe how we extend this formal model to specify the behaviour of STM 
programs. 


Notation. Given a relation r on a set A, we write r’, r+ and r* for the reflexive, 
transitive and reflexive-transitive closure of r, respectively. We write r~+ for the 
inverse of r; r|4 for rN A?; [A] for the identity relation on A, i.e. {(a, a) | ae A}; 
irreflexive(r) for ada. (a,a) € r; and acyclic(r) for irreflexive(rt). Given two 
relations rı and r2, we write r1;r2 for their (left) relational composition, 
i.e. {(a,b) | de. (a,c) € r1 A (c,b) € r2}. Lastly, when r is a strict partial order, we 
write rlimm for the immediate edges in r: {(a, b) € r| 7c. (a,c) € r A (c,d) E€ r}. 
The RA model is given by the fragment of the C11 memory model, where 
all read accesses are acquire (acq) reads, all writes are release (rel) writes, 
and all atomic updates (i.e. RMWs) are acquire-release (acqrel) updates. The 
semantics of a program under RA is defined as a set of consistent executions. 


Definition 1 (Executions in RA). Assume a finite set of locations Loc; a 
finite set of values VAL; and a finite set of thread identifiers TID. Let x,y,z 
range over locations, v over values and 7 over thread identifiers. An RA execution 
graph of an STM implementation, G, is a tuple of the form (E, po, rf, mo) with 
its nodes given by E and its edges given by the po, rf and mo relations such that: 
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e E CN is a finite set of events, and is accompanied with the functions 


tid(.) : E — Tip and lab(.) : E — LABEL, returning the thread identifier 
and the label of an event, respectively. We typically use a, b, and e to 
range over events. The label of an event is a tuple of one of the following 
three forms: (i) R(x,v) for read events; (ii) W(x,v) for write events; or 
(iii) U(x, v, v’) for update events. The lab(.) function induces the functions 
typ(.), loc(.), val,(.) and val,(.) that respectively project the type (R, W 
or U), location, and read/written values of an event, where applicable. The 
set of read events is denoted by R = {e € E | typ(e) € {R, U}}; similarly, 
the set of write events is denoted by W = {e € E | typ(e) € {W,U}} and 
the set of update events is denoted by U={ ROW. 

We further assume that E always contains a set E of initialisation events 
consisting of a write event with label W(x,0) for every x € Loc. 

po C E x E denotes the ‘program-order’ relation, defined as a disjoint 
union of strict total orders, each orders the events of one thread, together 
with Eo x (E \ Eo) that places the initialisation events before any other 
event. 

rf CWxR denotes the ‘reads-from’ relation, defined as a relation between 
write and read events of the same location and value; it is total and func- 
tional on reads, i.e. every read event is related to exactly one write event; 
mo C WxW denotes the ‘modification-order’ relation, defined as a disjoint 
union of strict orders, each of which totally orders the write events to one 
location. 


We often use “G.” as a prefix to project the various components of G (e.g. G.E). 
Given a relation r C E x E, we write roc for rN { (a,b) | loc(a) = 1oc(b)}. Anal- 
ogously, given a set A C E, we write A, for AN {a | loc(a) = r}. Lastly, given 


the rf and mo relations, we define the ‘reads-before’ relation rb = rf~!; mo \ [E]. 


Executions of a given 
program represent traces 
of shared memory accesses 
generated by the program. 
We only consider “parti- 
tioned” programs of the 
form ||-eTIp Cr, where || 
denotes parallel composi- 
tion, and each c; is a sequen- 
tial program. The set of exe- 
cutions associated with a 


Fig. 1. An RA-consistent execution of a transaction- 
free variant of (IRIW+txs) in Sect.1, with program 
outcome a=c=1andb=d=0. 


given program is then defined by induction over the structure of sequential pro- 
grams. We do not define this construction formally as it depends on the syntax 
of the implementation programming language. Each execution of a program P 
has a particular program outcome, prescribing the final values of local variables 
in each thread (see example in Fig. 1). 
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In this initial stage, the execution outcomes are unrestricted in that there 
are no constraints on the rf and mo relations. These restrictions and thus 
the permitted outcomes of a program are determined by the set of consistent 
executions: 


Definition 2 (RA-consistency). A program execution G is RA-consistent, 
written RA-consistent(G), if acyclic(hbj.. U mo U rb) holds, where hb £ (poUrf)+ 
denotes the ‘RA-happens-before’ relation. 


Among all executions of a given program P, only the RA-consistent ones define 
the allowed outcomes of P. 


3.1 Software Transactional Memory in RA: Specification 


Our goal in this section is to develop a declarative framework that allows us to 
specify the behaviour of mixed-mode STM programs under weak isolation guar- 
antees. Whilst the behaviour of transactional code is dictated by the particular 
isolation model considered (e.g. PSI), the behaviour of non-transactional code 
and its interaction with transactions is guided by the underlying memory model. 
As we build our STM in the RA fragment of C11, we assume the behaviour of 
non-transactional code to conform to the RA memory model. More concretely, 
we build our specification of a program P such that (i) in the absence of transac- 
tional code, the behaviour of P is as defined by the RA model; (ii) in the absence 
of non-transactional code, the behaviour of P is as defined by the PSI model. 


Definition 3 (Specification Executions). Assume a finite set of transaction 
identifiers TXIp. An execution graph of an STM specification, I’, is a tuple of 
the form (E,po, rf, mo, T) where: 

e EZ RUWUBUE, denotes the set of events with R and W defined 
as the sets of read and write events as described above; and the B and 
E respectively denote the set of events marking the beginning and end of 
transactions. For each event a € BU E, the lab(.) function is extended to 
return B when a € B, and E when a € E. The typ(.) function is accordingly 
extended to return a type in {R, W,U, B, E}, whilst the remaining functions 
are extended to return default (dummy) values for events in BU £. 

e po, rf and mo denote the ‘program-order’, ‘reads-from’ and ‘modification- 
order’ relations as described above; 

e T C E denotes the set of transactional events with BUE C T. For transac- 
tional events in 7, event labels are extended to carry an additional compo- 
nent, namely the associated transaction identifier. As such, a specification 
graph is additionally accompanied with the function tx(.) : T — TXI, 
returning the transaction identifier of transactional events. The derived 
‘same-transaction’ relation, st € T x T, is the equivalence relation given 
by st = {(a,b) E T x T | tx(a) = tx(b)}. 


We write T/st for the set of equivalence classes of 7 induced by st; [a], for the 
equivalence class that contains a; and T7¢ for the equivalence class of transaction 
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€ € TXIp: Tg = {a | tx(a)=€}. We write NT for non-transactional events: NT = 
E\ T. We often use “I.” as a prefix to project the 7 components. 


Specification Consistency. The consistency of specification graphs is model- 
specific in that it is dictated by the guarantees provided by the underlying model. 
In the upcoming sections, we present two consistency definitions of PSI in terms 
of our specification graphs that lack cycles of certain shapes. In doing so, we often 
write rr for lifting a relation r C E x E to transaction classes: rr Ê st; (r \ st); st. 
Analogously, we write rq to restrict r to the internal events of a transaction: rMst. 


Comparison to Dependency Graphs. Adya et al. proposed dependency 
graphs for declarative specification of transactional consistency models [5,7]. 
Dependency graphs are similar to our specification graphs in that they are con- 
structed from a set of nodes and a set of edges (relations) capturing certain 
dependencies. However, unlike our specification graphs, the nodes in dependency 
graphs denote entire transactions and not individual events. In particular, Adya 
et al. propose three types of dependency edges: (i) a read dependency edge, 


Tı WET, denotes that transaction T> reads a value written by T; (ii) a write 
dependency edge Tı oD denotes that Tə overwrites a value written by T; and 


(iii) an anti-dependency edge Tı KT denotes that T> overwrites a value read by 
Tı. Adya’s formalism does not allow for non-transactional accesses and it thus 
suffices to define the dependencies of an execution as edges between transac- 
tional classes. In our specification graphs however, we account for both transac- 
tional and non-transactional accesses and thus define our relational dependencies 
between individual events of an execution. However, when we need to relate an 
entire transaction to another with relation r, we use the transactional lift (rr) 
defined above. In particular, Adya’s dependency edges correspond to ours as 
follows. Informally, the WR corresponds to our rf; the WW corresponds to our 
moy; and the RW corresponds to our rbt. Adya’s dependency graphs have been 
used to develop declarative specifications of the PSI consistency model [14]. In 
Sect. 4, we revisit this model, redefine it as specification graphs in our setting, 
and develop a reference lock-based implementation that is sound and complete 
with respect to this abstract specification. The model in [14] does not account for 
non-transactional accesses. To remedy this, later in Sect. 5, we develop a declara- 
tive specification of PSI that allows for both transactional and non-transactional 
accesses. We then develop a reference lock-based implementation that is sound 
and complete with respect to our proposed model. 


4 Parallel Snapshot Isolation (PSI) 


We present a declarative specification of PSI (Sect.4.1), and develop a lock- 
based reference implementation of PSI in the RA fragment (Sect. 4.2). We then 
demonstrate that our implementation is both sound (Sect.4.3) and complete 
(Sect. 4.4) with respect to the PSI specification. Note that the PSI model in this 
section accounts for transactional code only; that is, throughout this section we 
assume that T.E = I.T. We lift this assumption later in Sect. 5. 
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4.1 A Declarative Specification of PSI STMs in RA 


In order to formally characterise the weak behaviour and anomalies admitted by 
PSI, Cerone and Gotsman [14,15] formulated a declarative PSI specification. (In 
fact, they provide two equivalent specifications: one using dependency graphs 
proposed by Adya et al. [5,7]; and the other using abstract executions.) As is 
standard, they characterise the set of executions admitted under PSI as graphs 
that lack certain cycles. We present an equivalent declarative formulation of PSI, 
adapted to use our notation as discussed in Sect. 3. It is straightforward to verify 
that our definition coincides with the dependency graph specification in [15]. As 
with [14,15], throughout this section, we take PSI execution graphs to be those 
in which E = T C (R U W) \U. That is, the PSI model handles transactional 
code only, consisting solely of read and write events (excluding updates). 
PSI Consistency. A PSI execution graph [=(E,po,rf,mo,T) is consistent, 
written psi-consistent(I’), if the following hold: 

e rfr U morU rbr C po (INT) 

e irreflexive((por U rf U mor)”; rbr’) (EXT) 

Informally, INT ensures the consistency of each transaction internally, while 
EXT provides the synchronisation guarantees among transactions. In particu- 
lar, we note that the two conditions together ensure that if two read events in 
the same transaction read from the same location x, and no write to x is po- 
between them, then they must read from the same write (known as ‘internal read 
consistency’). 

Next, we provide an alternative formulation of PSI-consistency that is closer 
in form to RA-consistency. This formulation is the basis of our extension in 
Sect.5 with non-transactional accesses. 


Lemma 1. A PSI execution graph I = (E,po,rf,mo,T) is consistent if and 
only if acyclic(psi-hbjo- U mo U rb) holds, where psi-hb denotes the ‘PSI-happens- 
before’ relation, defined as psi-hb = (po U rf U rfr U mor)*. 


Proof. The full proof is provided in the technical appendiz [4]. 


Note that this acyclicity condition is rather close to that of RA-consistency 
definition presented in Sect.3, with the sole difference being the definition of 
‘happens-before’ relation by replacing hb with psi-hb. The relation psi-hb is a 
strict extension of hb with rfr U mor, which captures additional synchronisa- 
tion guarantees resulting from transaction orderings, as described shortly. As in 
RA-consistency, the po and rf are included in the ‘PSI-happens-before’ relation 
psi-hb. Additionally, the rfr and mor also contribute to psi-hb. 

Intuitively, the rfr corresponds to synchronisation due to causality between 
transactions. A transaction Tı is causally-ordered before transaction T2, if Ty 
writes to x and Tz later (in ‘happens-before’ order) reads x. The inclusion of 
rfy ensures that Tə cannot read from T, without observing its entire effect. This 
in turn ensures that transactions exhibit an atomic ‘all-or-nothing’ behaviour. 
In particular, transactions cannot mix-and-match the values they read. 
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lock vx ê 
retry: v[x]:=vx; 
0. for (x EWS) lock vx; if (is-odd(v[x])) 
1. for (xERS) { goto retry; 
p if (!CAS(vx,v[x],v[x]+1)) 
2. a:=vVX; 
goto retry; 
3. if (is-odd(a) &&xZWS) continue; 
A 
4. if (x@WS) v[x]:=a; unlock vx = vx:=v[x]+2 
5. s[x]:=x; } valid(x) ê vx == v[x] 
6. for (x ERS) validrpsr(x) 2 vx==vix] && x==s [x] 
7. if (-valid(x)) goto line 1; [a:=x] Sa:=s[x] 
8. [T]; [x:=a] 4x:=a; s[x]:=a 
9. for (x EWS) unlock vx; [51352] £ [S1]; [Se] 
[while(e) S] 4 while(e) [S] 
and so on ... 


Fig. 2. PSI implementation of transaction T given RS, WS; the RPSI implementation 
(Sect. 5) is obtained by replacing valid on line 7 with validrpst. 


For instance, if Tı writes to both x and y, transaction Tə may not read the 
value of x from Tı but read the value of y from an earlier (in ‘happens-before’ 
order) transaction To. 

The mor corresponds to synchronisation due to conflicts between transac- 
tions. Its inclusion enforces the write-conflict-freedom of PSI transactions. In 
other words, if two transactions Tı and Tz both write to the same location x via 
events w, and wz such that w; > we, then Tı must commit before T2, and thus 
the entire effect of Tı must be visible to To. 


4.2 A Lock-Based PSI Implementation in RA 


We present an operational model of PSI that is both sound and complete 
with respect to the declarative semantics in Sect.4.1. To this end, in Fig. 2 
we develop a pessimistic (lock-based) reference implementation of PSI using 
sequence locks [13,18, 23,32], referred to as version locks in our implementation. 
In order to avoid taking a snapshot of the entire memory and thus decrease the 
locking overhead, we assume that a transaction T is supplied with its read set, 
RS, containing those locations that are read by T. Similarly, we assume T to be 
supplied with its write set, WS, containing the locations updated by T.? 

The implementation of T proceeds by exclusively acquiring the version locks 
on all locations in its write set (line 0). It then obtains a snapshot of the loca- 
tions in its read set by inspecting their version locks, as described shortly, and 
subsequently recording their values in a thread-local array s (lines 1-7). Once a 
snapshot is recorded, the execution of T proceeds locally (via [T] on line 8) as 


? A conservative estimate of RS and WS can be obtained by simple syntactic analysis. 
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follows. Each read operation consults the local snapshot in s; each write opera- 
tion updates the memory eagerly (in-place) and subsequently updates its local 
snapshot to ensure correct lookup for future reads. Once the execution of T is 
concluded, the version locks on the write set are released (line 9). Observe that 
as the writer locks are acquired pessimistically, we do not need to check for 
write-conflicts in the implementation. 

To facilitate our locking implementation, we assume that each location x is 
associated with a version lock at address x+1, written vx. The value held by a 
version lock vx may be in one of two categories: (i) an even number, denoting 
that the lock is free; or (ii) an odd number, denoting that the lock is exclusively 
held by a writer. For a transaction to write to a location x in its write set 
WS, the x version lock (vx) must be acquired exclusively by calling lock vx. 
Each call to lock vx reads the value of vx and stores it in v[x], where v is 
a thread-local array. It then checks if the value read is even (vx is free) and 
if so it atomically increments it by 1 (with a ‘compare-and-swap’ operation), 
thus changing the value of vx to an odd number and acquiring it exclusively; 
otherwise it repeats this process until the version lock is successfully acquired. 
Conversely, each call to unlock vx updates the value of vx to v[x]+2, restoring 
the value of vx to an even number and thus releasing it. Note that deadlocks 
can be avoided by imposing an ordering on locks and ensuring their in-order 
acquisition by all transactions. For simplicity however, we have elided this step 
as we are not concerned with progress or performance issues here and our main 
objective is a reference implementation of PSI in RA. 

Analogously, for a transaction to read from the locations in its read set RS, 
it must record a snapshot of their values (lines 1-7). To obtain a snapshot of 
location x, the transaction must ensure that x is not currently being written to 
by another transaction. It thus proceeds by reading the value of vx and recording 
it in v [x]. If vx is free (the value read is even) or x is in its write set WS, the value 
of x can be freely read and tentatively stored in s[x]. In the latter case, the 
transaction has already acquired the exclusive lock on vx and is thus safe in the 
knowledge that no other transaction is currently updating x. Once a tentative 
snapshot of all locations is obtained (lines 1-5), the transaction must validate it 
by ensuring that it reflects the values of the read set at a single point in time 
(lines 6-7). To do this, it revisits the version locks, inspecting whether their 
values have changed (by checking them against v) since it recorded its snapshot. 
If so, then an intermediate update has intervened, potentially invalidating the 
obtained snapshot; the transaction thus restarts the snapshot process. Otherwise, 
the snapshot is successfully validated and returned in s. 


4.3 Implementation Soundness 


The PSI implementation in Fig.2 is sound: for each RA-consistent implemen- 
tation graph G, a corresponding specification graph I" can be constructed such 
that psi-consistent(I”) holds. In what follows we state our soundness theorem and 
briefly describe our construction of consistent specification graphs. We refer the 
reader to the technical appendix [4] for the full soundness proof. 
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Theorem 1 (Soundness). For all RA-consistent implementation graphs G of 
the implementation in Fig. 2, there exists a PSI-consistent specification graph I" 
of the corresponding transactional program that has the same program outcome. 


Constructing Consistent Specification Graphs. Observe that given an exe- 
cution of our implementation with t transactions, the trace of each transaction 
i € {1---t} is of the form 6; = Ls; 5 FS, 5 S; S Ts; 2S Us;, where Lsi, FS;, 
Si, Ts; and Us; respectively denote the sequence of events acquiring the version 
locks, attempting but failing to obtain a valid snapshot, recording a valid snap- 
shot, performing the transactional operations, and releasing the version locks. 
For each transactional trace 6; of our implementation, we thus construct a cor- 


responding trace of the specification as 0; = B; Eoy s! = Ei, where B; and E; 

denote the transaction begin and end events (lab(B;)=B and lab(E;)=E). When 

Ts; is of the form tı Š --- 2S tn, we construct Ts! as t) 3 --- 5 t, with each ti 

defined either as t; = R(x, v) when t; = R(s [x], v) (i.e. the corresponding imple- 

mentation event is a read event); or as t} W(x, v) when tj=W(x, v) 2S wW(s [x], v). 
For each specification trace 0; we construct the ‘reads-from’ relation as: 


t; € Ts, A Ax, v. th=R(x, v) A w=W(x, v) 
A(w € Ts, => w Stn 
RF; = ¢ (w, ti) (Ve € Ts. w 5 e B tl; => (loc(e)#x V egW))) 
A(w ¢ Ts, > (Vec Ts’. (e 8 t; => (loc(e) # x V e g W)) 
AIr’ € Si. loc(r’)=x A (w,r’) € G.rf) 


That is, we construct our graph such that each read event ti from location x in 
Ts’, either (i) is preceded by a write event w to x in Ts; without an intermediate 
write in between them and thus ‘reads-from’ w (lines two and three); or (ii) is 
not preceded by a write event in Ts, and thus ‘reads-from’ the write event w 
from which the initial snapshot read r’ in S; obtained the value of x (last two 
lines). 

Given a consistent implementation graph G = (E, po, rf, mo), we construct a 
consistent specification graph T = (E, po,rf,mo, T) such that: 

© TE = Vien. 

transaction trace 0; of the specification constructed as above; 
e T.po = G.po|r.g — the I.po is that of G.po limited to the events in I.E; 
e rfê Uic- RF; — the T.rf is the union of RF; relations defined above; 


e Imo G.mo|r.g — the T.mo is that of G.mo limited to the events in I.E; 
e T.T =T-E, where for each e € T.T, we define tx(e) = i when e € 6%. 


ty 0..E — the events of I.E is the union of events in each 


4.4 Implementation Completeness 


The PSI implementation in Fig. 2 is complete: for each consistent specification 
graph I’ a corresponding implementation graph G can be constructed such that 
RA-consistent(G) holds. We next state our completeness theorem and describe 
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our construction of consistent implementation graphs. We refer the reader to the 
technical appendix [4] for the full completeness proof. 


Theorem 2 (Completeness). For all PSI-consistent specification graphs I’ of 
a transactional program, there exists an RA-consistent execution graph G of the 
implementation in Fig. 2 that has the same program outcome. 


Constructing Consistent Implementation Graphs. In order to construct 
an execution graph of the implementation G from the specification I", we follow 
similar steps as those in the soundness construction, in reverse order. More 
concretely, given each trace 0; of the specification, we construct an analogous 
trace of the implementation by inserting the appropriate events for acquiring and 
inspecting the version locks, as well as obtaining a snapshot. For each transaction 
class T; € T /st, we must first determine its read and write sets and subsequently 
decide the order in which the version locks are acquired (for locations in the 
write set) and inspected (for locations in the read set). This then enables us 
to construct the ‘reads-from’ and ‘modification-order’ relations for the events 
associated with version locks. 

Given a consistent execution graph of the specification T = (E, po, rf, mo, T), 
and a transaction class 7; € I.T /st, we write WSz, for the set of locations written 
to by Ti. That is, WSz, = Usceznw loc(e). Similarly, we write RSz, for the set of 
locations read from by 7;, prior to being written to by Ti. For each location x 
read from by 7;, we additionally record the first read event in 7; that retrieved 
the value of x. That is, 


RSz, Ê {(x,7) |r E TA Re A He € TN Bx. er} 


Note that transaction 7; may contain several read events reading from x, prior 
to subsequently updating it. However, the internal-read-consistency property 
ensures that all such read events read from the same write event. As such, as 
part of the read set of 7; we record the first such read event (in program-order). 

Determining the ordering of lock events hinges on the following observation. 
Given a consistent execution graph of the specification I = (FE, po,rf,mo,T), 


mo|imm mo|imm 


let for each location x the total order mo be given as: wy > +++ > Wn. 
Observe that this order can be broken into adjacent segments where the events 
of each segment belong to the same transaction. That is, given the transaction 
classes T.T /st, the order above is of the following form where Ti,- , Jm € 
T.T /st and for each such T; we have x € WSz, and Wq 1) `+: Wani) € Fi: 


mo|imm mOlimm mo|imm molimm mo|imm mo|imm 
W(1,1) > TE > W(1,n1) — eee oH W(m,1) oH * a: oH W(m,nm) 


Ti Tm 


Were this not the case and we had w; > w © wg such that w1, w2 € T; and 
w € T; # Ti, we would consequently have w, "F w "Y wy, contradicting 
the assumption that I" is consistent. Given the above order, let us then define 
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I.MO, = [Zi - ++ Tm]. We write I.MO,|, for the it® item of I.MO,. As we describe 
shortly, we use [.MO, to determine the order of lock events. 

Note that the execution trace for each transaction 7; € I.T/st is of the 
form 6; = B; B T s! PS E;, where B; is a transaction-begin (B) event, F; is a 
transaction-end (E) event, and T's, = t 2S... PS 4 for some n, where each t; is 
either a read or a write event. As such, we have T.E = LT = Ugerr/s Ti = 
OE. 

For each trace 64 of the specification, we construct a corresponding trace of 
our implementation 6; as follows. Let RSz, = {(x1,71)-:-(Xp,Tp)} and WSz, = 


{ye FaN: We then construct 0; = Ls; 5 S; °5 Ts; 2S Us;, where 


e Ls; =D S... PS It and Us; = UY PS... PS yJ? denote the sequence 
of events acquiring and releasing the version locks, respectively. Each IPA 
and y are defined as follows, the first event D has the same identifier 
as that of B;, the last event U7 1 has the same identifier as that of EF, and 
the identifiers of the remaining events are picked fresh: 


Lj’ =U(vy;, 2a, 2a+1) Uj? =W@vy ;, 2a+2) where MO, | =f; 


We then define the mo relation for version locks such that if transaction 
Ti writes to y immediately after 7; (i.e. T; is MOy-ordered immediately 
after 7;), then J; acquires the vy version lock immediately after 7; has 
released it. On the other hand, if 7; is the first transaction to write to 
y, then it acquires vy immediately after the event initialising the value 
of vy, written inityy. Moreover, each vy release event of 7; is mo-ordered 
immediately after the corresponding vy acquisition event in 7;: 


(E UY) (L.MOx|, =J; > w=inityy) A 
IMO; ê U i) A ’ AT;,a > 0. T.MO, | =T; \ T.MO;,| =T; 
(w, Li) = a a—1 
yews, => w=U;) 


This partial mo order on lock events of 7; also determines the rf relation 
for its lock acquisition events: IRF} = Usecuse, {(w, Lf) | (w, LY) € IMO;}. 


[e] o [e] o [e] 
e Si = trž PS... PS trše PS ort PS... PS yr? denotes the sequence of 


events obtaining a tentative snapshot (tr;’) and subsequently validating 
it (ur;’). Each tr% sequence is defined as ir,’ * rj? 2S s* (reading the 
version lock vx;, reading x; and recoding it in s), with ir’, rj’, sy! and 
ur,’ events defined as follows (with fresh identifiers). We then define the 
rf relation for each of these read events in $;. For each (x,r) € RSz,, when 
r (i.e. the read event in the specification class J; that reads the value of 
x) reads from an event w in the specification graph ((w,r) € T.rf), we add 
(w,r*) to the rf relation of G (the first line of IRF? below). For version 
locks, if transaction J; also writes to x;, then ir} and vr,’ events (reading 
and validating the value of version lock vx;), read from the lock event in T; 


that acquired vxj, namely i; On the other hand, if transaction 7; does 
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not write to x; and it reads the value of x; written by 7}, then ir! and 
ur,’ read the value written to vx; by J; when releasing it (Uj). Lastly, if 
T; does not write to x; and it reads the value of x; written by the initial 
write, init,, then ir;’ and vr,’ read the value written to vx; by the initial 
write to vx, initys. 


A (x E€ WSz, > w’/=L*) 
24 1 x i i 
IRF; 7i U (wiri), A (x € WSz, A IT. w € T; > w'=U?) 
x,r)ERS7; (w , urž) i ae foe J 
A (x Z WSz, A w=init, > w'=init,;) 


ri=R(x;,v) s; =W(s[x;j],v) s.t. dw. (w,7r;7) € IRF? A val, (w)=v 


a 


ir,’ =ur,’ =R(vx;,v) s.t. Jw. (w, ir;?) € IRF? A val,(w)=v 
e Ts; =t D. 5% t, (when Ts, = H 5... 5 Y), with t; defined as 
follows: 
tj (s [x], v) when t; = R(x, v) 


=R 
t; = W(x, v) eae W(s [x], v) when t; = W(x, v) 


When t is a read event, the t; has the same identifier as that of tye When 
t; is a write event, the first event in ¢; has the same identifier as that of t; 
and the identifier of the second event is picked fresh. 

We are now in a position to construct our implementation graph. Given a 
consistent execution graph I" of the specification, we construct an execution 
graph G = (E,po,rf, mo) of the implementation as follows. 

e GE= U 6;.£ —note that G.E is an extension of I.E: T.E C G.E. 

TiEl.T /st 

e G.po is defined as I’.po extended by the po for the additional events of G, 

given by the 6; traces defined above. 

e Grf= U (IRF UIRF?) 

T,El.T /st 


+ 
e G.mo = Tmo U ( U IMO; ) 
T,EL.T /st 


5 Robust Parallel Snapshot Isolation (RPSI) 


In the previous section we adapted the PSI semantics in [14] to STM settings, 
in the absence of non-transactional code. However, a reasonable STM should 
account for mixed-mode code where shared data is accessed by both transactional 
and non-transactional code. To remedy this, we explore the semantics of PSI 
STMs in the presence of non-transactional code with weak isolation guarantees 
(see Sect.2.1). We refer to the weakly isolated behaviour of such PSI STMs 
as robust parallel snapshot isolation (RPSI), due to its ability to provide PSI 
guarantees between transactions even in the presence of non-transactional code. 
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Fig. 3. RPSI-inconsistent executions due to NT-RF (a); and T-RF (b) 


In Sect. 5.1 we propose the first declarative specification of RPSI STM pro- 
grams. Later in Sect. 5.2 we develop a lock-based reference implementation of our 
RPSI specification in the RA fragment. We then demonstrate that our imple- 
mentation is both sound (Sect.5.3) and complete (Sect. 5.4) with respect to our 
proposed specification. 


5.1 A Declarative Specification of RPSI STMs in RA 


We formulate a declarative specification of RPSI semantics by adapting the PSI 
semantics presented in Sect.4.1 to account for non-transactional accesses. As 
with the PSI specification in Sect.4.1, throughout this section, we take RPSI 
execution graphs to be those in which T C (RUW) \U. That is, RPSI transac- 
tions consist solely of read and write events (excluding updates). As before, we 
characterise the set of executions admitted by RPSI as graphs that lack cycles 
of certain shapes. More concretely, as with the PSI specification, we consider 
an RPSI execution graph to be consistent if acyclic(rpsi-hbj,- U mo U rb) holds, 
where rpsi-hb denotes the ‘RPSI-happens-before’ relation, extended from that of 
PSI psi-hb. 


Definition 4 (RPSI consistency). An RPSI execution graph [ = 
(E, po, rf, ,mo, T) is consistent, written rpsi-consistent(I’), if acyclic(rpsi-hbj,.U 
moUrb) holds, where rpsi-hb denotes the ‘RPSI-happens-before’ relation, defined 
as the smallest relation that satisfies the following conditions: 


rpsi-hb; rpsi-hb C rpsi-hb (TRANS) 

po U rf U mor C rpsi-hb (PSI-HB) 

[E \ T]; rf; st C rpsi-hb (NT-RF) 

st; ({(W]; st; (rpsi-hb \ st); st; [R]) toc; st C rpsi-hb (T-RF) 


The TRANS and PSI-HB ensure that rpsi-hb is transitive and that it includes 
po, rf and moy as with its PSI counterpart. The NT-RF ensures that if a value 
written by a non-transactional write w is observed (read from) by a read event 
r in a transaction T, then its effect is observed by all events in T. That is, the w 
happens-before all events in T and not just r. This allows us to rule out executions 
such as the one depicted in Fig. 3a, which we argue must be disallowed by RPSI. 
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Consider the execution graph of Fig. 3a, where transaction Tı is denoted by 
the dashed box labelled T1, comprising the read events rı and r2. Note that 
as rı and rg are transactional reads without prior writes by the transaction, 
they constitute a snapshot of the memory at the time Tı started. That is, the 
values read by rı and r2 must reflect a valid snapshot of the memory at the 
time it was taken. As such, since we have (w2,r2) € rf, any event preceding we 
by the ‘happens-before’ relation must also be observed by (synchronise with) 
Tı. In particular, as w; happens-before wə ((w1, w2) € po), the wı write must 
also be observed by Tı. The NT-RF thus ensures that a non-transactional write 
read from by a transaction (i.e. a snapshot read) synchronises with the entire 
transaction. 

Recall from Sect. 4.1 that the PSI psi-hb relation includes rfr which has not 
yet been included in rpsi-hb through the first three conditions described. As 
we describe shortly, the T-RF is indeed a strengthening of rfr to account for 
the presence of non-transactional events. In particular, note that rfr is included 
in the left-hand side of T-RF: when rpsi-hb in ([W]; st; (rpsi-hb \ st); st; [R]) is 
replaced with rf C rpsi-hb, the left-hand side yields rfr. As such, in the absence 
of non-transactional events, the definitions of psi-hb and rpsi-hb coincide. 

Recall that inclusion of rfy in psi-hb ensured transactional synchronisation 
due to causal ordering: if T4 writes to x and Tə later (in psi-hb order) reads 
x, then Tı must synchronise with Tə. This was achieved in PSI because either 
(i) Tə reads x directly from Tı in which case Tı synchronises with Tə via rft; 
or (ii) T2 reads x from another later (mo-ordered) transactional write in T3, in 
which case Tı synchronises with T3 via mot, T3 synchronises with Tə via rf, and 
thus Tı synchronises with Tz via moy; rft. How are we then to extend rpsi-hb to 
guarantee transactional synchronisation due to causal ordering in the presence 
of non-transactional events? 

To justify T-RF, we present an execution graph that does not guarantee 
synchronisation between causally ordered transactions and is nonetheless deemed 
RPStI-consistent without the T-RF condition on rpsi-hb. We thus argue that this 
execution must be precluded by RPSI, justifying the need for T-RF. Consider 
the execution in Fig.3b. Observe that as transaction Tı writes to x via wy, 
transaction Tz reads x via rə, and (w1,r2) € rpsi-hb (wi his ri Š ws bid r2), 
Tı is causally ordered before Tz and hence Tı must synchronise with Tz. As 
such, the r3 in Tə must observe w2 in Tı: we must have (w2,r3) € rpsi-hb, 
rendering the above execution RPSI-inconsistent. To enforce the rpsi-hb relation 
between such causally ordered transactions with intermediate non-transactional 
events, T-RF stipulates that if a transaction Tı writes to a location (e.g. x via w1 
above), another transaction Tz reads from the same location (r2), and the two 
events are related by ‘RPSI-happens-before’ ((w1,r2) € rpsi-hb), then Tı must 
synchronise with T2. That is, all events in Tı must ‘RPSI-happen-before’ those 
in To. Effectively, this allows us to transitively close the causal ordering between 
transactions, spanning transactional and non-transactional events in between. 
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Fig. 4. A mixed-mode program with its annotated behaviour disallowed by RPSI (left); 
an RA-consistent execution graph of its RPSI implementation (right) 


5.2 A Lock-Based RPSI Implementation in RA 


We present a lock-based reference implementation of RPSI in the RA fragment 
(Fig. 2) by using sequence locks [13, 18,23, 32]. Our implementation is both sound 
and complete with respect to our declarative RPSI specification in Sect. 5.1. 

The RPSI implementation in Fig. 2 is rather similar to its PSI counterpart. 
The main difference between the two is in how they validate the tentative snap- 
shot recorded in s. As before, in order to ensure that no intermediate transac- 
tional writes have intervened since s was recorded, for each location x in RS, 
the validation phase revisits vx, inspecting whether its value has changed from 
that recorded in v Ex]. If this is the case, the snapshot is deemed invalid and the 
process is restarted. However, checking against intermediate transactional writes 
alone is not sufficient as it does not preclude the intervention of non-transactional 
writes. This is because unlike transactional writes, non-transactional writes do 
not update the version locks and as such their updates may go unnoticed. In 
order to rule out the possibility of intermediate non-transactional writes, for 
each location x the implementation checks the value of x against that recorded 
in s[x]. If the values do not agree, an intermediate non-transactional write 
has been detected: the snapshot fails validation and the process is restarted. 
Otherwise, the snapshot is successfully validated and returned in s. Observe 
that checking the value of x against s[x] does not entirely preclude the pres- 
ence of non-transactional writes, in cases where the same value is written (non- 
transactionally) to x twice. 

To understand this, consider the mixed-mode program on the left of Fig. 4 
comprising a transaction in the left-hand thread and a non-transactional pro- 
gram in the right-hand thread writing the same value (1) to z twice. Note that 
the annotated behaviour is disallowed under RPSI: all execution graphs of the 
program with the annotated behaviour yield RPSI-inconsistent execution graphs. 
Intuitively, this is because the values read by the transaction (x : 0, y: 0, z: 1) 
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do not constitute a valid snapshot: at no point during the execution of this 
program, are the values of x, y and z as annotated. 

Nevertheless, it is possible to find an RA-consistent execution of the RPSI 
implementation in Fig. 2 that reads the annotated values as its snapshot. Con- 
sider the execution graph on the right-hand side of Fig. 4, depicting a particular 
execution of the RPSI implementation (Fig. 2) of the program on the left. The 
rx, ry and rz denote the events reading the initial snapshot of x, y and z and 
recording them in s (line 5), respectively. Similarly, the rz’, ry’ and rz’ denote 
the events validating the snapshots recorded in s (line 7). As T is the only trans- 
action in the program, the version numbers vx, vy and vz remain unchanged 
throughout the execution and we have thus omitted the events reading (line 2) 
and validating (line 7) their values from the execution graph. Note that this 
execution graph is RA-consistent even though we cannot find a corresponding 
RPSl|-consistent execution with the same outcome. To ensure the soundness of 
our implementation, we must thus rule out such scenarios. 

To do this, we assume that if multiple non-transactional writes write the 
same value to the same location, they cannot race with the same transaction. 
More concretely, we assume that every RPSI-consistent execution graph of a 
given program satisfies the following condition: 


Vx. Yr € TN Rg. Yw, w E NT N Wa. 
w#w' ^valy(w) = val,(w’) A (r, w) € rpsi-hb A (r, w’) € rpsi-hb  (*) 
= (w,r) € rpsi-hb A (w, r) € rpsi-hb 


That is, given a transactional read r from location x, and any two distinct 
non-transactional writes w, w’ of the same value to x, either (i) at least one of 
the writes RPSI-happen-after r; or (ii) they both RPSI-happen-before r. 

Observe that this does not hold of the program in Fig. 2. Note that this stip- 
ulation does not prevent two transactions to write the same value to a location 
x. As such, in the absence of non-transactional writes, our RPSI implementation 
is equivalent to that of PSI in Sect. 4.2. 


5.3 Implementation Soundness 


The RPSI implementation in Fig. 2 is sound: for each consistent implementation 
graph G, a corresponding specification graph I’ can be constructed such that 
rpsi-consistent(I") holds. In what follows we state our soundness theorem and 
briefly describe our construction of consistent specification graphs. We refer the 
reader to the technical appendix [4] for the full soundness proof. 


Theorem 3 (Soundness). Let P be a program that possibly mixes transac- 
tional and non-transactional code. If every RPSI-consistent execution graph of P 
satisfies the condition in (x), then for all RA-consistent implementation graphs 
G of the implementation in Fig. 2, there exists an RPSI-consistent specifica- 
tion graph I of the corresponding transactional program with the same program 
outcome. 
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Constructing Consistent Specification Graphs. Constructing an RPSI- 
consistent specification graph from the implementation graph is similar to the 
corresponding PSI construction described in Sect. 4.3. More concretely, the 
events associated with non-transactional events remain unchanged and are sim- 
ply added to the specification graph. On the other hand, the events associated 
with transactional events are adapted in a similar way to those of PSI in Sect. 4.3. 
In particular, observe that given an execution of the RPSI implementation with 
t transactions, as with the PSI implementation, the trace of each transaction 
i € {1---t} is of the form 0; = Ls; S FS; $ 5,5 Ts; S Usi, with Lsi, FSi, 
S;, Ts; and Us; denoting analogous sequences of events to those of PSI. The dif- 
ference between an RPSI trace 0; and a PSI one is in the FS; and S; sequences, 
obtaining the snapshot. In particular, the validation phases of FS; and S; in 
RPSI include an additional read for each location to rule out intermediate non- 
transactional writes. As in the PSI construction, for each transactional trace 6; 
of our implementation, we construct a corresponding trace of the specification 
as 0 = B; Š Ts, 2S E;, with B;, E; and Ts’, as defined in Sect. 4.3. 

Given a consistent RPSI implementation graph G = (£E,po,rf,mo), let 
GNT = G.E\ Vier.--1} 0.E denote the non-transactional events of G. We 
construct a consistent RPSI specification graph I = (E, po, rf, mo, T) such that: 

e TE 2 GNTU Vie} 6..F — the I.E events comprise the non- 

transactional events in G and the events in each transactional trace 6% 
of the specification; 

e T.po = G.po|r.g — the T.po is that of G.po restricted to the events in I.E; 

e rf S Vie} RF; U G.rf;[G.NT] — the T.rf is the union of RF; rela- 

tions for transactional reads as defined in Sect. 4.3, together with the G.rf 
relation for non-transactional reads; 

e Imo ê G.mo|r.g — the T.mo is that of G.mo restricted to the events in 

TE; 
e rT Uictie} 0;.E, where for each e € 0;.E, we define tx(e) = i. 


We refer the reader to the technical appendix [4] for the full proof demonstrating 
that the above construction of I” yields a consistent specification graph. 


5.4 Implementation Completeness 


The RPSI implementation in Fig. 2 is complete: for each consistent specification 
graph I’ a corresponding implementation graph G can be constructed such that 
RA-consistent(G) holds. We next state our completeness theorem and describe 
our construction of consistent implementation graphs. We refer the reader to the 
technical appendix [4] for the full completeness proof. 


Theorem 4 (Completeness). For all RPSI-consistent specification graphs T 
of a program, there exists an RA-consistent execution graph G of the implemen- 
tation in Fig. 2 that has the same program outcome. 
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Constructing Consistent Implementation Graphs. In order to construct 
an execution graph of the implementation G from the specification I", we follow 
similar steps as those in the corresponding PSI construction in Sect.4.4. More 
concretely, the events associated with non-transactional events are unchanged 
and simply added to the implementation graph. For transactional events, given 
each trace 0; of a transaction in the specification, as before we construct an anal- 
ogous trace of the implementation by inserting the appropriate events for acquir- 
ing and inspecting the version locks, as well as obtaining a snapshot. For each 
transaction class 7; € T /st, we first determine its read and write sets as before 
and subsequently decide the order in which the version locks are acquired and 
inspected. This then enables us to construct the ‘reads-from’ and ‘modification- 
order’ relations for the events associated with version locks. 

Given a consistent execution graph of the specification T = (E, po, rf, mo, T), 
and a transaction class J; € I.T /st, we define WSz, and RSz, as described in 
Sect. 4.4. Determining the ordering of lock events hinges on a similar observa- 
tion as that in the PSI construction. Given a consistent execution graph of the 
specification [ = (E,po,rf,mo,T), let for each location x the total order mo 


molimm molimm 


be given as: w Wn,- This order can be broken into adjacent 
segments where the events of each segment are either non-transactional writes 
or belong to the same transaction. That is, given the transaction classes I.T /st, 
the order above is of the following form where 71,- -- , Zm € T.T /st and for each 
such T; we have x € WSz, and wi,1)°++ Wan) © Ti: 


mo|imm MO|imm mo|imm molimm mo|imm mo|imm 
Way tt Wam) mt) We nm) 


T.NTUOUT, ENTUT m 


Were this not the case and we had w; > w “> we such that w1, w2 € T; and 
w € T; # Tj, we would consequently have w, F w F w, contradicting the 
assumption that I is consistent. We thus define T.MO, = [Zi --- Tm]. 

Note that each transactional execution trace of the specification is of the 
form 6, = Bi & Ts.  E,, with B;, E; and Ts‘, as described in Sect. 4.4. 
For each such 6%, we construct a corresponding trace of our implementation 
as 6; = Ls; pie S; E Ts; ES Us;, where Ls;, Ts; and Us; are as defined in 
Sect. 4.4, and S; = trž! SS... er? S urs 5... ES uri” denotes the sequence 
of events obtaining a tentative snapshot (tr;’) and subsequently validating it 

po xj Po 


a a ge hoes phat ee he 
(ur;’). Each trž sequence is of the form ivr}? 5 ir}? 3 s, with ivr;’, ir)? and 


s,’ defined below (with fresh identifiers). Similarly, each vr;’ sequence is of the 
form fry’ PS fur’, with fr? and fur,’ defined as follows (with fresh identifiers). 
We then define the rf relation for each of these read events in S; in a similar way. 

For each (x,r) € RSz,, when r (the event in the specification class 7; that 
reads the value of x) reads from w in the specification graph ((w,r) € T.rf), we 
add (w, ir?) and (w, fr*) to the rf of G (the first line of IRF? below). For version 
locks, as before if transaction 7; also writes to xj, then ivr,’ and fur,’ events 
(reading and validating vxj), read from the lock event in 7; that acquired vxj, 


Xj 


namely L,’. Similarly, if 7; does not write to x; and it reads the value of x; 
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written by the initial write, init,, then ivr;’ and fur;’ read the value written to 
vx, by the initial write to vx, init,,. Lastly, if transaction 7; does not write to 
xj and it reads x; from a write other than init,, then ir}! and vur;’ read from 
the unlock event of a transaction 7; (i.e. Už), who has x in its write set and 
whose write to x, wx, maximally ‘RPSI-happens-before’ r. That is, for all other 
such writes that ‘RPSI-happen-before’ r, then w, ‘RPSI-happens-after’ them. 


(w,r) € I.rf A (x E€ Wz, > w’'=L?) 


(w, iri), | A (x Z WS7, A w=inity > w'=initvx) 
Re* (J (w, fri) |A (x Z WSz, A wHinit, > 
? (w’, ivrž), rpsi-hb ho pte 
(xr) ERST, dws, Tj. wx E Tj VWs Awe > rAw=U} 
(w ' fori ) z rpsi-hb z rpsi-hb 


AM wy, Te. wW ETk NW: Aw, —> r= w, —> w)) 
ir) =fr} =R(@;,v) s3 =WGLx,],v) s.t. Iw. (w, ir) € IRF? A val,(w)=v 


ior,’ =fur,? =R(vx;, v) s.t. dw. (w, ivrz) € IRF? A val,(w)=v 


We are now in a position to construct our implementation graph. Given a 
consistent execution graph I’ of the specification, we construct an execution 
graph of the implementation, G = (E, po, rf, mo), such that: 

e GE= U 6.EUINT; 

TiEr.T /st 

e G.po is defined as I’.po extended by the po for the additional events of G, 

given by the 6; traces defined above; 


e Grf = U (IRF} UIRF?), with IRF} as in Sect. 4.4 and IRF? defined 
TiErL.T /st 
above; 
+ 
e G.mo=I.moU ( U IMO; ) , with IMO; as defined in Sect. 4.4. 
T,ELl.T /st 


6 Conclusions and Future Work 


We studied PSI, for the first time to our knowledge, as a consistency model 
for STMs as it has several advantages over other consistency models, thanks to 
its performance and monotonic behaviour. We addressed two significant draw- 
backs of PSI which prevent its widespread adoption. First, the absence of a 
simple lock-based reference implementation to allow the programmers to readily 
understand and reason about PSI programs. To address this, we developed a 
lock-based reference implementation of PSI in the RA fragment of C11 (using 
sequence locks), that is both sound and complete with respect to its declara- 
tive specification. Second, the absence of a formal PSI model in the presence 
of mixed-mode accesses. To this end, we formulated a declarative specification 
of RPSI (robust PSI) accounting for both transactional and non-transactional 
accesses. Our RPSI specification is an extension of PSI in that in the absence 
of non-transactional accesses it coincides with PSI. To provide a more intuitive 
account of RPSI, we developed a simple lock-based RPSI reference implemen- 
tation by adjusting our PSI implementation. We established the soundness and 
completeness of our RPSI implementation against its declarative specification. 
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As directions of future work, we plan to build on top of the work presented 
here in three ways. First, we plan to explore possible lock-based reference imple- 
mentations for PSI and RPSI in the context of other weak memory models, such 
as the full C11 memory models [9]. Second, we plan to study other weak trans- 
actional consistency models, such as SI [10], ALA (asymmetric lock atomicity), 
ELA (encounter-time lock atomicity) [28], and those of ANSI SQL, including 
RU (read-uncommitted), RC (read-committed) and RR (repeatable reads), in 
the STM context. We aim to investigate possible lock-based reference implemen- 
tations for these models that would allow the programmers to understand and 
reason about STM programs with such weak guarantees. Third, taking advan- 
tage of the operational models provided by our simple lock-based reference imple- 
mentations (those presented in this article as well as those in future work), we 
plan to develop reasoning techniques that would allow us to verify properties 
of STM programs. This can be achieved by either extending existing program 
logics for weak memory, or developing new program logics for currently unsup- 
ported models. In particular, we can reason about the PSI models presented 
here by developing custom proof rules in the existing program logics for RA 
such as [22,39]. 
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Abstract. We address the problem of validity in eventually consistent 
(EC) systems: In what sense does an EC data structure satisfy the 
sequential specification of that data structure? Because EC is a very 
weak criterion, our definition does not describe every EC system; how- 
ever it is expressive enough to describe any Convergent or Commutative 
Replicated Data Type (CRDT). 


1 Introduction 


In a replicated implementation of a data structure, there are two impediments 
to requiring that all replicas achieve consensus on a global total order of the 
operations performed on the data structure (Lamport 1978): (a) the associated 
serialization bottleneck negatively affects performance and scalability (e.g. see 
(Ellis and Gibbs 1989)), and (b) the CAP theorem imposes a tradeoff between 
consistency and partition-tolerance (Gilbert and Lynch 2002). 

In systems based on optimistic replication (Vogels 2009; Saito and Shapiro 
2005), a replica may execute an operation without synchronizing with other 
replicas. If the operation is a mutator, the other replicas are updated asyn- 
chronously. Due to the vagaries of the network, the replicas could receive and 
apply the updates in possibly different orders. 

For sequential systems, the correctness problem is typically divided into 
two tasks: proving termination and proving partial correctness. Termination 
requires that the program eventually halt on all inputs, whereas partial cor- 
rectness requires that the program only returns results that are allowed by the 
specification. 

For replicated systems, the analogous goals are convergence and validity. 
Convergence requires that all replicas eventually agree. Validity requires that 
they agree on something sensible. In a replicated list, for example, if the only 
value put into the list is 1, then convergence ensures that all replicas eventually 
see the same value for the head of the list; validity requires that the value be 1. 

Convergence has been well-understood since the earliest work on replicated 
systems. Convergence is typically defined as eventual consistency, which requires 
that once all messages are delivered, all replicas have the same state. Strong 
eventual consistency (SEC) additionally requires convergence for all subsets of 
messages: replicas that have seen the same messages must have the same state. 
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Perhaps surprisingly, finding an appropriate definition of validity for repli- 
cated systems remains an open problem. There are solutions which use concur- 
rent specifications, discussed below. But, as Shavit (2011) noted: 


“It is infinitely easier and more intuitive for us humans to specify how 
abstract data structures behave in a sequential setting, where there are no 
interleavings. Thus, the standard approach to arguing the safety proper- 
ties of a concurrent data structure is to specify the structure’s properties 
sequentially, and find a way to map its concurrent executions to these 
‘correct’ sequential ones.” 


In this paper we give the first definition of validity that is both (1) derived from 
standard sequential specifications and (2) validates the examples of interest. 

We take the “examples of interest” to be Convergent/Commutative Replicated 
Data Types (CRDTs). These are replicated structures that obey certain mono- 
tonicity or commutativity properties. As an example of a CRDT, consider the 
add-wins set, also called an “observed remove” set in Shapiro et al. (2011la). The 
add-wins set behaves like a sequential set if add and remove operations on the 
same element are ordered. The concurrent execution of an add and remove result 
in the element being added to the set; thus the remove is ignored and the “add 
wins.” This concurrent specification is very simple, but as we will see in the next 
section, it is quite difficult to pin down the relationship between the CRDT and 
the sequential specification used in the CRDT’s definition. This paper is the first 
to successfully capture this relationship. 

Many replicated data types are CRDTs, but not all (Shapiro et al. 2011a). 
Notably, Amazon’s Dynamo (DeCandia et al. 2007) is not a CRDT. Indeed, 
interest in CRDTs is motivated by a desire to avoid the well-know concurrency 
anomalies suffered by Dynamo and other ad hoc systems (Bieniusa et al. 2012). 

Shapiro et al. (2011b) introduced the notion of CRDT and proved that every 
CRDT has an SEC implementation. Their definition of SEC includes convergence, 
but not validity. 

The validity requirement can be broken into two components. We describe 
these below using the example of a list data type that supports only two 
operations: the mutator put, which adds an element to the end of the 
list, and the query q, which returns the state of the list. This structure 
can be specified as a set of strings such as “put(1); put(3); g=[1,3]” and 
“put (1); put (2); put (3); q=[1,2,3]”. 


— Linearization requires that a response be consistent with some specification 
string. A state that received put(1) and put(3), may report g=[1,3] or 
q=[3,1], but not q=[2,1,3], since 2 has not been put into the list. 

— Monotonicity requires that states evolve in a sensible way. We might permit 
the state q=[1,3] to evolve into q=[1,2,3], due to the arrival of action 
put (2). But we would not expect that q=[1,3] could evolve into q=[3,1], 
since the data type does not support deletion or reordering. 


Burckhardt et al. (2012) provide a formal definition of validity using partial 
orders over events: linearizations respect the partial order on events; monotonicity 
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is ensured by requiring that evolution extends the partial order. Similar definitions 
can be found in Jagadeesan and Riely (2015) and Perrin et al. (2015). Replicated 
datastructures that are sound with respect to this definition enjoy many good prop- 
erties, which we discuss throughout this paper. However, this notion of correctness 
is not general enough to capture common CRDTs, such as the add-wins set. 

This lack of expressivity lead Burckhardt et al. (2014) to abandon notions of 
validity that appeal directly to a sequential specification. Instead they work 
directly with concurrent specifications, formalizing the style of specification 
found informally in Shapiro et al. (2011b). This has been a fruitful line of work, 
leading to proof rules (Gotsman et al. 2016) and extensions (Bouajjani et al. 
2014). See (Burckhardt 2014; Viotti and Vukolic 2016) for a detailed treatment. 

Positively, concurrent specifications can be used to validate any replicated 
structure, including CRDTs as well as anomalous structures such as Dynamo. 
Negatively, concurrent specifications have no the clear connection to their 
sequential counterparts. In this paper, we restore this connection. We arrive 
at a definition of SEC that admits CRDTs, but rejects Dynamo. 

The following “corner cases” are a useful sanity-check for any proposed notion 
of validity. 


— The principle of single threaded semantics (PSTS) (Haas et al. 2015) states 
that if an execution uses only a single replica, it should behave according to 
the sequential semantics. 

— The principle of single master (PSM) (Budhiraja et al. 1993) states that if all 
mutators in an execution are initiated at a single replica, then the execution 
should be linearizable (Herlihy and Wing 1990). 

— The principle of permutation equivalence (PPE) (Bieniusa et al. 2012) states 
that “if all sequential permutations of updates lead to equivalent states, then 
it should also hold that concurrent executions of the updates lead to equiv- 
alent states.” 


PSTS and PSM say that a replicated structure should behave sequentially when 
replication is not used. PPE says that the order of independent operations 
should not matter. Our definition implies all three conditions. Dynamo fails PPE 
(Bieniusa et al. 2012), and thus fails to pass our definition of SEC. 

In the next section, we describe the validity problem and our solution in 
detail, using the example of a binary set. The formal definitions follow in Sect. 3. 
We state some consequences of the definition and prove that the add-wins set 
satisfies our definition. In Sect.4, we describe a collaborative text editor and 
prove that it is SEC. In Sect. 5 we characterize the programmer’s view of a CRDT 
by defining the most general CRDT that satisfies a given sequential specification. 
We show that any program that is correct using the most general CRDT will be 
correct using a more restricted CRDT. We also show that our validity criterion 
for SEC is local in the sense of Herlihy and Wing (1990): independent structures 
can be verified independently. In Sect.6, we apply these results to prove the 
correctness of a graph that is implemented using two SEC sets. 
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Our work is inspired by the study of relaxed memory, such as (Alglave 2012). 
In particular, we have drawn insight from the RMO model of Higham and Kawash 
(2000). 


2 Understanding Replicated Sets 


In this section, we motivate the definition of SEC using replicated sets as an 
example. The final definition is quite simple, but requires a fresh view of both 
executions and specifications. We develop the definition in stages, each of which 
requires a subtle shift in perspective. Each subsection begins with an example 
and ends with a summary. 


2.1 Mutators and Non-mutators 


An implementation is a set of executions. We model executions abstractly as 
labelled partial orders (LPOs). The ordering of the LPO captures the history that 
precedes an event, which we refer to as visibility. 


MLM -moewm-™M) 


(1) 


Here the events are a through j, with labels +0, +1, etc., and order represented 
by arrows. The LPO describes an execution with two replicas, shown horizontally, 
with time passing from left to right. Initially, the top replica receives a request 
to add 0 to the set (+0%). Concurrently, the bottom replica receives a request to 
add 1 (+1°). Then each replica is twice asked to report on the items contained 
in the set. At first, the top replica replies that 0 is present and 1 is absent 
(¥0°x1°), whereas the bottom replica answers with the reverse (X09V1"). Once 
the add operations are visible at all replicas, however, the replicas give the same 
responses (M07V/1° and /0'/1/). 

LPOs with non-interacting replicas can be denoted compactly using sequential 
and parallel composition. For example, the prefix of (1) that only includes the 
first three events at each replica can be written (+07; 40°; X1°) || (+14; X09; M1"). 

A specification is a set of strings. Let SET be the specification of a sequential 
set with elements 0 and 1. Then we expect that SET includes the string “+0/0X1”, 
but not “+0X0/1”. Indeed, each specification string can uniquely be extended 
with either vO or XO and either V1 or X1. 

There is an isomorphism between strings and labelled total orders. Thus, 
specification strings correspond to the restricted class of LPOs where the visibility 
relation provides a total order. 

Linearizability (Herlihy and Wing 1990) is the gold standard for concurrent 
correctness in tightly coupled systems. Under linearizability, an execution is valid 
if there exists a linearization T of the events in the execution such that for every 
event e, the prefix of e in 7 is a valid specification string. 
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Execution (1) is not linearizable. The failure can already be seen in the sub- 
LPO (+07; X1°) || (+14; X09). Any linearization must have either +1/ before X1° 
or +0° before X09. In either case, the linearization is invalid for SET. 

Although it is not linearizable, execution (1) is admitted by every CRDT SET 
in Shapiro et al. (2011a). To validate such examples, Burckhardt et al. (2012) 
develop a weaker notion of validity by dividing labels into mutators and accessors 
(also known as non-mutators). Similar definitions appear in Jagadeesan and 
Riely (2015) and Perrin et al. (2015). Mutators change the state of a replica, 
and accessors report on the state without changing it. For SET, the mutators M 
and non-mutators M are as follows. 


M = {+0, -0, +1, -1}, representing addition and removal of bits 0 and 1. 
M = {xo, VO, X1, V1}, representing membership tests returning false or true. 


Define the mutator prefix of an event e to include e and the mutators visible to 
e. An execution is valid if there exists a linearization of the execution, 7, such 
that for every event e, the mutator prefix of e in T is a valid specification string. 

It is straightforward to see that execution (1) satisfies this weaker criterion. 
For both vO? and X1°, the mutator prefix is +0%. This includes +0% but not +1/, 
and thus their answers are validated. Symmetrically, the mutator prefixes of X09 
and /1” only include +1/. The mutator prefixes for the final four events include 
both +0% and +1/, but none of the prior accessors. 


Summary: Convergent states must agree on the final order of mutators, but inter- 
mediate states may see incompatible subsequences of this order. By restricting 
attention to mutator prefixes, the later states need not linearize these incompat- 
ible views of the partial past. 

This relaxation is analogous to the treatment of non-mutators in update 
serializability (Hansdah and Patnaik 1986; Garcia-Molina and Wiederhold 1982), 


which requires a global serialization order for mutators, ignoring non-mutators. 


2.2 Dependency 


The following LPO is admitted by the add-wins SET discussed in the introduction. 


O-@-@.._, _. 
(vo > v1 (2) 
(+1, >(40} -(-0 a) o 


In any CRDT implementation, the effect of +1? is negated by the subsequent - 1° 
The same reasoning holds for +0° and -0/. In an add-wins set, however, the 
concurrent adds, +0% and +12, win over the deletions. Thus, in the final state 
both 0 and 1 are present. 

This LPO is not valid under the definition of the previous subsection: Since 
YoY and /1" see the same mutators, they must agree on a linearization of (+0°; 
+1?; -1°) || (+1%;+0°; -0f). Any linearization must end in either -1° or -0/; thus 
it is not possible for both VO and 41” to be valid. 
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Similar issues arise in relaxed memory models, where program order is often 
relaxed between uses of independent variables (Alglave et al. 2014). Generalizing, 
we write m # n to indicate that labels m and n are dependent. Dependency is 
a property of a specification, not an implementation. Our results only apply to 
specifications that support a suitable notion of dependency, as detailed in Sect. 3. 
For SET, # is an equivalence relation with two equivalence classes, corresponding 
to actions on the independent values 0 and 1. 


# = {+0, -0, X0, VOY? U {+1, -1, X1, V1}, where D? = D x D. 


While the dependency relation for SET is an equivalence, this is not required: In 
Sect. 4 we establish the correctness of collaborative text editing protocol with an 
intransitive dependency relation. 

The dependent restriction of (2) is as follows. 


a b c 
+0 srl alt h 
Rasa “ea o 


In the previous subsection, we defined validity using the mutator prefix of an 
event. We arrive at a weaker definition by restricting attention to the mutator 
prefix of the dependent restriction. 

Under this definition, execution (2) is validated: Any interleaving of the 

strings +0°-0/+0°V09 and +1°-1°+17V1” linearizes the dependent restriction of 
(2) given in (3). 
Summary: CRDTs allow independent mutators to commute. We formalize this 
intuition by restricting attention to mutator prefixes of the dependent restriction. 
The CRDT must respect program order between dependent operations, but is free 
to reorder independent operations. 

This relaxation is analogous to the distinction between program order and 
preserved program order (PPO) in relaxed memory models (Higham and Kawash 
2000; Alglave 2012). Informally, PPO is the suborder of program order that 
removes order between independent memory actions, such as successive reads 
on different locations without an intervening memory barrier. 


2.3 Puns 


The following LPO is admitted by the add-wins SET. 
a b e d 
Go} +(-0}- +(Yo}(x0) 
es w 
>= > 1 > 
O T i 
As in execution (2), the add +0% is undone by the following remove -0°, but the 


concurrent add +0° wins over -0°, allowing /0°. In effect, O° sees the order of 
the mutators as +0% -0°+0°. Symmetrically, VOI sees the order as +0° -0f +0°. 
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While this is very natural from the viewpoint of a CRDT, there is no linearization 
of the events that includes both +0% -0° +0° and +0° -0f +0, since +0% and +0° 
must appear in different orders. 

Indeed, this LPO is not valid under the definition of the previous subsection. 
First note that all events are mutually dependent. To prove validity we must 
find a linearization that satisfies the given requirements. Any linearization of 
the mutators must end in either -0° or -0/. Suppose we choose +0% -0° +0° -0f 
and look for a mutator prefix to satisfy YO%. (All other choices lead to similar 
problems.) Since -0f precedes YOY and is the last mutator in our chosen lin- 
earization, every possible witness for YO? must end with mutator -0/. Indeed 
the only possible witness is +0 +0° -0f YO9. However, this is not a valid specifi- 
cation string. 

The problem is that we are linearizing events, rather than labels. If we shift 
to linearizing labels, then execution (4) is allowed. Fix the final order for the 
mutators to be +0 -0 +0 -0. The execution is allowed if we can find a subsequence 
that linearizes the labels visible at each event. It suffices to choose the witnesses 
as follows. In the table, we group events with a common linearization together. 


#07 0°? 40 O°, O9: +0-0+0M0 
-0°, -of: +0-0 Xo%, Xo": +0-0+0-0X0 


Each of these is a valid specification string. In addition, looking only at mutators, 
each is a subsequence of +0 -0 +0 -0. 

In execution (4), each of the witnesses is actually a prefiz of the final mutator 
order, but, in general, it is necessary to allow subsequences. 


p. oa (5) 


Execution (5) is admitted by the add-wins SET. It is validated by the final 
mutator sequence -0+0. The mutator prefix +0 of b is a subsequence of -0 
+0, but not a prefix. 


Summary: While dependent events at a single replica must be linearized in order, 
concurrent events may slip anywhere into the linearization. A CRDT may pun on 
concurrent events with same label, using them in different positions at different 
replicas. Thus a CRDT may establish a final total over the labels of an execution 
even when there is no linearization of the events. 


2.4 Frontiers 


In the introduction, we mentioned that the validity problem can be decomposed 
into the separate concerns of linearizability and monotonicity. The discussion 
thus far has centered on the appropriate meaning of linearizability for CRDTs. In 
this subsection and the next, we look at the constraints imposed by monotonicity. 
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Consider the prefix {+0%, -0°, +0°, 0°, -0/} of execution (4), extended with 
action X0*, with visibility order as follows. 


OZOLLO 


= (6) 


——— ea 


This execution is not strong EC, since VO° and X0” see exactly the same mutators, 
yet provide incompatible answers. 

Unfortunately, execution (6) is valid by the definition given in the previous 
section: The witnesses for a—f are as before. In particular, the witness for /0° is 
“+0-0+0/0”. The witness for XO? is “+0+0-0X0”. In each case, the mutator prefix 
is a subsequence of the global mutator order “+0-0+0-0”. 

It is well known that punning can lead to bad jokes. In this case, the problem 
is that XO” is punning on a concurrent -0 that cannot be matched by a visible 
-0 in its history: the execution -0 that is visible to XO” must appear between the 
two +0 operations; the specification -0 that is used by XO” must appear after. 
The final states of execution (4) have seen both remove operations, therefore the 
pun is harmless there. But “O° and X0” have seen only one remove. They must 
agree on how it is used. 

Up to now, we have discussed the linearization of each event in isolation. We 
must also consider the relationship between these linearizations. When working 
with linearizations of events, it is sufficient to require that the linearization cho- 
sen for each event be a subsequence for the linearization chosen for each visible 
predecessor; since events are unique, there can be no confusion in the lineariza- 
tion about which event is which. Execution (6) shows that when working with 
linearizations of labels, it is insufficient to consider the relationship between indi- 
vidual events. The linearization “+0+0-0X0” chosen for XO” is a supersequence of 
those chosen for its predecessors: “+0” for +0° and “+0-0” for -0°. The lineariza- 
tion “+0-0+0/0” chosen for /0° is also a supersequence for the same predecessors. 
And yet, O° and XO” are incompatible states. 

Sequential systems have a single state, which evolves over time. In distributed 
systems, each replica has its own state, and it is this set of states that evolves. 
Such a set of states is called a (consistent) cut (Chandy and Lamport 1985). 

A cut of an LPO is a sub-LPO that is down-closed with respect to visibility. The 
frontier of cut is the set of maximal elements. For example, there are 14 frontiers 
of execution (6): the singletons {+07}, {-0°}, {Yoc}, {+0°}, {-of}, {X07}, the 
pairs {+0%, +0°}, {+0%, -o4}, {-0°, +0°}, {-0°, -of}, {v0°, -of}, {vor, XOF}, 
{xo”, -of}, and the triple {¥0°, XO”, -0f}. As we explain below, we consider 
non-mutators in isolation. Thus we do not consider the last four cuts, which 
include a non-mutator with other events. That leaves 10 frontiers. The definition 
of the previous section only considered the 6 singletons. Singleton frontiers are 
generated by pointed cuts, with a single maximal element. 

When applied to frontiers, the monotonicity requirement invalidates execu- 
tion (6). Monotonicity requires that the linearization chosen for a frontier be a 
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subsequence of the linearization chosen for any extension of that frontier. If we 
are to satisfy state VO“ in execution (6), the frontier {-0°, +0°} must linearize to 
“+0-0+0”. If we are to satisfy state XO”, the frontier {-0°, +0°} must linearize to 
“+0+0-0”. Since we require a unique linearization for each frontier, the execution 
is disallowed. 

Since CRDTs execute non-mutators locally, it is important that we ignore 
frontiers with multiple non-mutators. Recall execution (4): 


O00" 


| Tooo 
Œ rA, 


e >of 


There is no specification string that linearizes the cut with frontier {V0°, 40%}, 
since we cannot have vO immediately after -0. If we consider only pointed cuts 
for non-mutators, then the execution is SEC, with witnesses as follows. 


{+07}, {+0°} : +0 {0°}, {409}: +0-0+0V0 
{+0°, +0°} : +0+0 {-0°,-of} : +0-0+0-0 
{-0°}, {-0f} : +0-0 {x0}, {x0*}: +0-0+0-0X0 


{-0°,+0°}, {+0, -of}: +0-0+0 


In order to validate non-mutators, we must consider singleton non-mutator 

frontiers. The example shows that we must not consider frontiers with multi- 
ple non-mutators. There is some freedom in the choices otherwise. For SET, we 
can “saturate” an execution with accessors by augmenting the execution with 
accessors that witness each cut of the mutators. In a saturated execution, it is 
sufficient to consider only the pointed accessor cuts, which end in a maximal 
accessor. For non-saturated executions, we are forced to examine each mutator 
cut: it is possible that a future accessor extension may witness that cut. The 
status of “mixed” frontiers, which include mutators with a single maximal non- 
mutator, is open for debate. We choose to ignore them, but the definition does 
not change if they are included. 
Summary: A CRDT must have a strategy for linearizing all mutator labels, even in 
the face of partitions. In order to ensure strong EC, the definition must consider 
sets of events across multiple replicas. Because non-mutators are resolved locally, 
SEC must ignore frontiers with multiple non-mutators. 

Cuts and frontiers are well-known concepts in the literature of distributed 
systems (Chandy and Lamport 1985). It is natural to consider frontiers when 
discussing the evolving correctness of a CRDT. 


2.5 Stuttering 
Consider the following execution. 
J 
a b c 
+0 -0 +0 
ORORO 


(0) 


Od G 
x~ z 
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This LPO represents a partitioned system with events a-e in one partition and 
x-z in the other. As the partition heals, we must be able to account for the 
intermediate states. Because of the large number of events in this example, we 
have elided all accessors. We will present the example using the semantics of the 
add-wins set. Recall that the add-wins set validates YO if and only if there is 
a maximal +0 beforehand. Thus, a replica that has seen the cut with frontier 
{+0%, -0¥, -07} must answer VO, whereas a replica that has seen {-0°, -0¥, -07} 
must answer X0. 

Any linearization of {+0%, -0¥,-0*} must end in +0, since the add-win set 
must reply YO: the only possibility is “+0-0-0+0". The linearization of {-0°, 
-0¥, -0*} must end in -0. If it must be a supersequence, the only possibility is 
“+0-0-0+0-0”. Taking one more step on the left, {+0°, -0¥, -0*} must linearize to 
“+0-0-0+0-0+0". Thus the final state {-0%, -0°, -0¥, -07} must linearize to “+0 
-0-0+0-0+0-0-0”. Reasoning symmetrically, the linearization of {-0%, -0°, +07} 
must be “+0-0+0-0-0+0”, and thus the final {-0%, -0°, -0¥, -07} must linearize to 
“+0-0+0-0-0+0-0-0”. The constraints on the final state are incompatible. Each 
of these states can be verified in isolation; it is the relation between them that 
is not satisfiable. 

Recall that monotonicity requires that the linearization chosen for a frontier 
be a subsequence of the linearization chosen for any extension of that frontier. 
The difficulty here is that subsequence relation ignores the similarity between “+0 
-0-0+0-0+0-0-0” and “+0-0+0-0-0+0-0-0”. Neither of these is a subsequence 
of the other, yet they capture exactly the same sequence of states, each with six 
alternations between XO and vO. The canonical state-based representative for 
these sequences is “+0-0+0-0+0-0”. 

CRDTs are defined in terms of states. In order to relate CRDTs to sequential 
specifications, it is necessary to extract information about states from the speci- 
fication itself. Adapting Brookes (1996), we define strings as stuttering equivalent 
(notation o ~ T) if they pass through the same states. So +0+1+0 ~ +0+1 but 
+0-0+0 % +0. If we consider subsequences up to stuttering, then execution (7) 
is SEC, with witnesses as follow: 


{a}, {x}, {a, x} : +0 
{b}, {y}, {y, 2}, {2} : +0-0 


{a, y}, {a, y, z}, {a, z}, {b,£} : +0-0+0 
{b, y}, {b y, 2}, 16, z}, {d}, {d,e}, {e} : +0-0+0-0 
{cy}, {c, y, z}, {c z}, {d, £}, {d,e, x}, {e, x} : +0-0+0-0+0 


{d, y}, {d, y, z}, {d, z}, 
{e, Y}, {e, Y, rae {e, ae {d, €, Y}, {d, e, y, z}, {d, €, z}: +0-0+0-0+0-0 


Recall that without stuttering, we deduced that {+0°, -0¥, -07 } must linearize to 
“+0-0-0+0-0+0” and {-0%, -0°, +07} must linearize to “+0-0+0-0-0+0”. Under 
stuttering equivalence, these are the same, with canonical representative “+0 
-0+0-0+0”. Thus, monotonicity under stuttering allows both linearizations to 
be extended to satisfy the final state {-0%,-0°,-0%,-0*}, which has canonical 
representative “+0-0+0-0+0-0”. 
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Summary: CRDTs are described in terms of convergent states, whereas specifi- 
cations are described as strings of actions. Actions correspond to labels in the 
LPO of an execution. Many strings of actions may lead to equivalent states. For 
example, idempotent actions can be applied repeatedly without modifying the 
state. 

The stuttering equivalence of Brookes (1996) addresses this mismatch. In 
order to capture the validity of CRDTs, the definition of subsequence must change 
from a definition over individual specification strings to a definition over equiv- 
alence classes of strings up to stuttering. 


3 Eventual Consistency for CRDTs 


This section formalizes the intuitions developed in Sect. 2. We define executions, 
specifications and strong eventual consistency (SEC). We discuss properties of 
eventual consistency and prove that the add-wins set is SEC. 


3.1 Executions 


An execution realizes causal delivery if, whenever an event is received at a replica, 
all predecessors of the event are also received. Most of the CRDTs in Shapiro et al. 
(2011a) assume causal delivery, and we assumed it throughout the introductory 
section. There are costs to maintaining causality, however, and not all CRDTs 
assume that executions incur these costs. In the formal development, we allow 
non-causal executions. 

Shapiro et al. (2011a) draw executions as timelines, explicitly showing the 
delivery of remote mutators. Below left, we give an example of such a timeline. 


> 40541 > 
— eee 


— e Vi > X0 >> wo (vi }—>(x0 } (vo) 


This is a non-causal execution: at the bottom replica, +1 is received before +0, 
even though +0 precedes +1 at the top replica. 

Causal executions are naturally described as Labelled Partial Orders (LPOs), 
which are transitive and antisymmetric. Section2 presented several examples 
of LPOs. To capture non-causal systems, we move to Labelled Visibility Orders 
(LvOs), which are merely acyclic. Acyclicity ensures that the transitive closure 
of an LVO is an LPO. The right picture above shows the LVO corresponding to the 
timeline on the left. The zigzag arrow represents an intransitive communication. 
When drawing executions, we use straight lines for “transitive” edges, with the 
intuitive reading that “this and all preceding actions are delivered”. 

LVOs arise directly due to non-causal implementations. As we will see in 
Sect. 4, they also arise via projection from an LPO. 

LVOs are unusual in the literature. To make this paper self-contained, we 
define the obvious generalizations of concepts familiar from LPOs, including iso- 
morphism, suborder, restriction, maximality, downclosure and cut. 
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Fix a set L of labels. A Labelled Visibility Order (LVO, also known as an 
execution) is a triple u = (Eu, Au, œu) where E, is a finite set of events, Ay, € 
(E„ > L) and ~, C (E, x E,) is reflexive and acyclic. 

Let u, v range over LVOs. Many concepts extend smoothly from LPOs to LVOs. 


— Isomorphism: Write u =iso v when u and v differ only in the carrier set. We 
are often interested in the isomorphism class of an LVO. 

— Pomset: We refer to the isomorphism class of an LVO as a pomset. Pomset 
abbreviates Partially Ordered Multiset (Plotkin and Pratt 1997). We stick 
with the name “pomset” here, since “vomset” is not particularly catchy. 

— Suborder: Write u C v when E,, C Ey, Au © Av, Pu © Pu, and (~y) C (x). 

— Restriction:' When D C E,, define v | D = (D, A, | D, ~» | D). Restriction 
lifts subsets to suborders: v | D denotes the sub-Lvo derived from a subset D 
of events. See Sect. 2.2 for an example of restriction. 

— Mazimal elements: max(v) = {d € E, | Ae € (E, \ {d}). d ~v e}. 

We say that d is maximal for v when if d € max(v). 

— Non-maximal suborder: max(v) = v | (E, \ max(v)). 
max(v) is the suborder with the maximal elements removed. 

— Downclosure: D is downclosed for v if D C {e € E, | 3d € D. d ~, e}. 

— Cut: u is a cut of v if u C v and E, is downclosed for v. 

Let cuts(v) be the set of all cuts of v. A cut is the sub-Lvo corresponding 
to a downclosed set. Cuts are also known as prefixes. See Sect.2.4 for an 
example. A cut is determined by its maximal elements: if u € cuts(v) then 
u= v | {d € E, | Je € max(v). d ~, e}. 

— Linearization: For a; € L, we say that a1 ...an is a linearization of E C E, 
if there exists a bijection a : E — [1, n] such that Ve € E. A (e) = aa(e) and 
Vd, e € E. d ~, e implies a(d) < a(e). 


Replica-Specific Properties. In the literature on replicated data types, some prop- 
erties of interest (such as “read your writes” (Tanenbaum and Steen 2007)) 
require the concept of “session” or a distinction between local and remote 
events. These can be accommodated by augmenting LVOs with a replica labelling 
Pu € (Ey + R), which maps events to a set R of replica identifiers. 

Executions can be generated operationally as follows: Replicas receive muta- 
tor and accessor events from the local client; they also receive mutator events 
that are forwarded from other replicas. Each replica maintains a set of seen 
events: an event that is received is added to this set. When an event is received 
from the local client, the event is additionally added to the execution, with the 
predecessors in the visibility relation corresponding to the current seen set. If 
we wish to restrict attention to causal executions, then we require that replicas 
forward all the mutators in their seen sets, rather than individual events, and, 
thus, the visibility relation is transitive over mutators. 

All executions that are operationally generated satisfy the additional prop- 
erty that ~>u„ is per-replica total: if o(d) = p(e) then either d ~u e or e ~u d. 


1 We use the standard definitions for restriction on functions and relations. Given a 
function f : E —> X, R: E x E and D C E, define f | D = { (d, f(d)) | d € D} and 
R |D = {(di, dz) | di, d2 € D and dı R do}. 
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We do not demand per-replica totality because our results do not rely on replica- 
specific information. 


3.2 Specifications and Stuttering Equivalence 


Specifications are sets of strings, equipped with a distinguished set of mutators 
and a dependency relation between labels. Specifications are subject to some 
constraints to ensure that the mutator set and dependency relations are sensible; 
these are inspired by the conditions on Mazurkiewicz executions (Diekert and 
Rozenberg 1995). Every specification set yields a derived notion of stuttering 
equivalence. This leads to the definition of observational subsequence (<obs ). 

We use standard notation for strings: Let o and 7 range over strings. Then 
oT denotes concatenation, o* denotes Kleene star, ø ||| r denotes the set of 
interleavings, € denotes the empty string and gf denotes the itè element of ø. 
These notations lift to sets of strings via set union. 

A specification is a quadruple (L, M, #, +’) where 


— Lis a set of actions (also known as labels), 

—~ M C L is a distinguished set of mutator actions, 

— # C (L x L) is asymmetric and reflexive dependency relation, and 
— X C L* isa set of valid strings. 


Let M = L \ M be the sets of non-mutators. 
A specification must satisfy the following properties: 


(a) prefix closed: or € X implies o € X 

(b) non-mutators are closed under stuttering, and commutation: 
Va € M. oar € X implies ca*r C X 
Va, b € M. {ca, ob} C X implies {cab, oba} C X 

(c) independent actions commute: 
Va, b € L. —(a # b) implies (sabr € X iff obar € X) 


Property (b) ensures that non-mutators do not affect the state of the data struc- 
ture. Property (c) ensures that commuting of independent actions does not affect 
the state of the data structure. 

Recall that the SET specification takes M = {+0, -0, +1, -1}, representing 
addition and removal of bits 0 and 1, and M = {X0, vO, X1, V1}, representing 
membership tests returning false or true. The dependency relation is # = {+0, 
-0, XO, VOP U {+1, -1, X1, V1}, where D? = D x D. 

The dependency relation for SET is an equivalence, but this need not hold 
generally. We will see an example in Sect. 4. 

The definitions in the rest of the paper assume that we have fixed a specifi- 
cation (L, M, #, X). In the examples of this section, we use SET. 

State and Stuttering Equivalence. Specification strings o and 7 are state equiva- 


lence (notation o ~ T) if every valid extension of ø is also a valid extension of 7, 
and vice versa. For example, +0+1+0 ~ +0+1 and +0-0+0 ~ +0, but +0-0 % +0. 
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In particular, state equivalent strings agree on the valid accessors that can imme- 
diately follow them: either vO or XO and either v1 or X1. Formally, we define 
state equivalence, ~ C L* x L*, as follows?. 


(a = 0’) = (o=) or ({0, o'} C X and Yr € L*. or € X iff o'r e yX). 


From specification property (b), we know that non-mutators do not affect the 
state. Thus we have that ua ~ u whenever a € M and ua € X. From specification 
property (c), we know that independent actions commute. Thus we have that 
cab ~ oba whenever —(a # b) and {oab, oba} C X. 

Two strings are stuttering equivalent? if they only differ in operations that 
have no effect on the state of the data structure, as given by X. Adapting Brookes 
(1996) to our notion of state equivalence, we define stuttering equivalence, ~ C 
L* x L*, to be the least equivalence relation generated by the following rules, 
where a ranges over L. 


o~ o ox oa ob~o ~(a#b) 


ENE ca~ o'a o~oa cab ~ oa 


The first rule above handles the empty string. The second rule allows stuttering 
in any context. The third rule motivates the name stuttering equivalence, for 
example, allowing +0+0 ~ +0. The last case captures the equivalence generated 
by independent labels, for example, allowing +0+1+0 ~ +0+1 but not +0-0+0 ~ 
+0-0. Using the properties of ~ discussed above, we can conclude, for example, 
that +0/0/0+0-0X0 ~ +0-0. 

Consider specification strings for a unary SET over value 0. Since stuttering 
equivalence allows us to remove both accessors and adjacent mutators with the 
same label we deduce that the canonical representatives of the equivalence classes 


? ? 


induced by ~ are generated by the regular expression (+0) ‘(-0+0)*(-0)*. 
Observational Subsequence. Recall that ac is a subsequence of abc, although it 
is not a prefix. We write <ceq for subsequence and <ops for observational subse- 
quence, defined as follows. 


: 2+ am SS 3 / / 
O1 °° On Sseq TOOT *** OnTn o Lobs T if Jo ~o. Ir ~T. o <seq T 


Note that observational subsequence includes both subsequence and stuttering 
equivalence (<obs ) C (<seq ) U (~). 

<seq can be understood in isolation, whereas <obs can only be understood 
with respect to a given specification. In the remainder of the paper, the implied 
specification will be clear from context. <seq is a partial order, whereas <obs is 
only a preorder, since it is not antisymmetric. 

Let o and 7 be strings over the unary SET with canonical representatives ao’ 
and br’. Then we have that o <obs T exactly when either a = b and |o’| < Ir] 


2 To extend the definition to non-specification strings, we allow o ~% o’ when o = o”. 
3 Readers of Brookes (1996) should note that mumbling is not relevant here, since all 
mutators are visible. 
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or a Æ b and |o’| < lrt: Thus, observational subsequence order is determined 
by the number of alternations between the mutators. 

Specification strings for the binary SET, then, are stuttering equivalent 
exactly when they yield the same canonical representatives when restricted to 0 
and to 1. Thus, observational subsequence order is determined by the number 
of alternations between the mutators, when restricted to each dependent sub- 
sequence. (The final rule in the definition of stuttering, which allows stuttering 
across independent labels, is crucial to establishing this canonical form.) 


3.3 Eventual Consistency 


Eventual consistency is defined using the cuts of an execution and the observa- 
tional subsequence order of the specification. As noted in Sects. 2.2 and 2.4, it 
is important that we not consider all cuts. Thus, before we define SEC, we must 
define dependent cuts. 

The dependent restriction of an execution is defined: v | # = (Ey, Av, Bo), 
where d La e when A,(d) # Av(e) and d ~, e. See Sect. 2.2 for an example of 
dependent restriction. 

The dependent cuts of v are cuts of the dependent restriction. As discussed 
in Sect. 2.4, we only consider pointed cuts (with a single maximal element) for 
non-mutators. See Sect. 2.4 for an example. 


cuts (v) = {u € cuts(v | #) | Ve € Ey. if Au(e) € M then max(u) = {e} } 


An execution v is Eventually Consistent (SEC) for specification (L, M, #, 
X) iff there exists a function 7 : cuts (v) — X that satisfies the following. 


Linearization: Vp € cutsy(v). p linearizes to T(p), and 
Monotonicity: Vp, q € cutsy(v). p C q implies T(p) <obs T(q). 


A data structure implementation is SEC if all of its executions are SEC. 

In Sect. 2, we gave several examples that are SEC. See Sects. 2.4 and 2.5 for 
examples where 7 is given explicitly. Section 2.4 also includes an example that 
is not SEC. 

The concerns raised in Sect. 2 are reflected in the definition. 


— Non-mutators are ignored by the dependent restriction of other non- 
mutators. As discussed in Sect. 2.1, this relaxation is similar that of update- 
serializability (Hansdah and Patnaik 1986; Garcia-Molina and Wiederhold 
1982). 

— Independent events are ignored by the dependent restriction of an event. As 
discussed in Sect. 2.2, this relaxation is similar to preserved program order 
in relaxed memory models (Higham and Kawash 2000; Alglave 2012). 

— As discussed in Sect. 2.3, punning is allowed: each cut p is linearized sepa- 
rately to a specification string T(p). 

— As discussed in Sect. 2.4, we constrain the power puns by considering cuts of 
the distributed system (Chandy and Lamport 1985). 
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— Monotonicity ensures that the system evolves in a sensible way: new order 
may be introduced, but old order cannot be forgotten. As discussed in 
Sect. 2.5, the preserved order is captured in the observational subsequence 
relation, which allows stuttering (Brookes 1996). 


3.4 Properties of Eventual Consistency 


We discuss some basic properties of SEC. For further analysis, see Sect. 5. 

An important property of CRDTs is prefix closure: If an execution is valid, 
then every prefix of the execution should also be valid. Prefix closure follows 
immediately from the definition, since whenever u is a prefix of v we have that 
cuts (u) C cuts (v). 

Prefix closure looks back in time. It is also possible to look forward: A system 
satisfies eventual delivery if every valid execution can be extended to a valid 
execution with a maximal element that sees every mutator. If one assumes that 
every specification string can be extended to a longer specification string by 
adding non-mutators, then eventual delivery is immediate. 

The properties PSTS, PSM and PPE are discussed in the introduction. An 
SEC implementation must satisfies PPE since every dependent set of mutators is 
linearized: SEC enforces the stronger property that there are no new intermediate 
states, even when executing all mutators in parallel. For causal systems, where 
~+,, is transitive, PSTS and PSM follow by observing that if there is a total order 
on the mutators of u then any linearization of u is a specification string. 

Burckhardt (2014, Sect.5) provides a taxonomy of correctness criteria 
for replicated data types. Our definition implies NOCIRCULARCAUSALITY 
and CAUSALARBITRATION, but does not imply either CONSISTENTPREFIX or 
CAUSALVISIBILITY. For LPOs, which model causal systems, our definition implies 
CAUSALVISIBILITY. READMYWRITES and MONOTONICREADS require a dis- 
tinction between local and remote events. If one assumes the replica-specific 
constraints given in Sect. 3.1, then our definition satisfies these properties; with- 
out them, our definition is too abstract. 


3.5 Correctness of the Add-Wins Set 


The add-wins set is defined to answer vk for a cut u exactly when 


Jd € u. A (d) =+tk A (Ae €u. ule) = -k A d~ue). 


It answers Xk otherwise. The add-wins set is called the “observed-remove” set. 
We show that any LPO that meets this specification is SEC with respect to 
SET. We restrict attention to LPOs since causal delivery is assumed for the add- 
wins set in (Shapiro et al. 2011a). 
For SET, the dependency relation is an equivalence. For an equivalence rela- 
tion R, let L/R C 2" denote the set of (disjoint) equivalence classes for R. 
For SET, L/# = {{+0, -0, XO, VO}, {+1, -1, X1, ¥1}}. When dependency is an 
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equivalence, then every interleaving of independent actions is valid if any inter- 
leaving is valid. Formally, we have the following, where ||| denotes interleaving. 


VD € (L/#). Vo € D*. Yr € (L\ D)*. (o || 7) AZ £ Ø implies (0 ||| 7) C X 


Using the forthcoming composition result (Theorem 2), it suffices for us to 
address the case when u only involves operations on a single element, say 0. 
For any such LVO u, we choose a linearization T(u) € (-0|+0)* that has a maxi- 
mum number of alternations between -0 and +0. If there is a linearization that 
begins with -0, then we choose one of these. Below, we summarize some of the 
key properties of such a linearization. 


— T(u) ends with +0 iff there is an +0 that is not followed by any -0 in u. 
— For any LPO v C u, T(v) has at most as many alternations as T(u). 


The first property above ensures that the accessors are validated correctly, i.e., 
0 is deemed to be present iff there is an +0 that is not followed by any -0. 

We are left with proving monotonicity, i.e., if u C v, then T(u) <obs T(v). 
Consider 7(u) = ao and r(v) = bp. 


— If b = a, the second property above ensures that T(u) <obs T(v). 

— In the case that b 4 a, we deduce by construction that b = -0 and a = +0. In 
this case, p starts with +0 and has at least as many alternations as T(w). So, 
we deduce that 7(u) <obs p. The required result follows since p <obs T(v). 


4 A Collaborative Text Editing Protocol 


In this section we consider a variant of the collaborative text editing protocol 
defined by Attiya et al. (2016). After stating the sequential specification, TEXT, 
we sketch a correctness proof with respect to our definition of eventual consis- 
tency. This example is interesting formally: the dependency relation is not an 
equivalence, and therefore the dependent projection does not preserve transitiv- 
ity. The generality of intransitive LvOs is necessary to understand TEXT, even 
assuming a causal implementation. 


Specification. Let a, b range over nodes, which contain some text, a unique iden- 
tifier, and perhaps other information. Labels have the following forms: 


Mutator !a initializes the text to node a. 

— Mutator +a<b adds node a immediately before node b. 

Mutator +a>b adds node a immediately after node b. 

— Mutator -b removes node b. 

— Non-mutator query ?b; --- bn returns the current state of the document. 


We demonstrate the correct answers to queries by example. Initially, the docu- 
ment is empty, whereas after initialization, the document contains a single node; 
thus the specification contains strings such as “?e !c ?c’, where € represents the 
empty document. Nodes can be added either before or after other nodes; thus 
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“te +b<c +d>c’ results in the document ?bcd. Nodes are always added adjacent 
to the target; thus, order matters in “!c +e>c +d>c” which results in ?cde rather 
than ?ced. Removal does what one expects; thus “!c +e>c +d>c -c” results in 
?de. 

Attiya et al. (2016) define the interface for TEXT using integer indices as 
targets, rather than nodes. Using the unique correspondence between the nodes 
and it indices (since node are unique), one can easily adapt an implementation 
that satisfies our specification to their interface. 

We say that node a is a added in the actions !a, +a<b and +a>b. Node b is a 
target in +a<b and +a>b. In addition to correctly answering queries, specifications 
must satisfy the following constraints: 


— Initialization may occur at most once, 

— each node may be added at most once, 

— a node may be removed only after it is added, and 

— a node may be used as a target only if it has been added and not removed. 


These constraints forbid adding to a target that has been removed; thus 
“te +d>c -c” is a valid string, but “!c -c +d>c’ is not. It also follows that ini- 
tialization must precede any other mutators. 

Because add operations use unique identifiers, punning and stuttering play 
little role in this example. In order to show the implementation correct, we need 
only choose an appropriate notion of dependency. As we will see, it is necessary 
that removes be independent of adds with disjoint label sets, but otherwise all 
actions may be dependent. Let Li, be the set of add and query labels, and let 
nodes return the set of nodes that appear in a label. Then we define dependency 
as follows. 

LH k iff {2, k} C Lise or nodes(€) N nodes(k) 4 0) 


Implementation. We consider executions that satisfy the same four conditions 
above imposed on specifications. We refer the reader to the algorithm of Attiya 
et al. (2016) that provides timestamps for insertions that are monotone with 
respect to causality. 

As an example, Attiya et al. (2016) allow the execution given on the left 
below. In this case, the dependent restriction is an intransitive LVO, even though 
the underlying execution is an LPO: in particular, !b does not precede -d in the 
dependent restriction. We give the order considered by dependent cuts on the 
right—this is a restriction of the dependent restriction: since we only consider 
pointed accessor cuts, we can safely ignore order out of non-mutators. 
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( i] (-d}+(?bc}+(+a<b) 
A 
N 


(aa) (-b}>(?ced}>(+e>d) 


This execution is not linearizable, but it is SEC, choosing witnesses to be 
subsequences of the mutator string “!b +d>b +c>b +a<b +e>d -b -d’. Here, the 
document is initialized to b, then c and d are added after b, resulting in ?bcd. The 
order of c and d is determined by their timestamps. Afterwards, the top replica 
removes d and adds a; the bottom replica removes b and adds e, resulting in 
the final state ?ace. In the right execution, the removal of order out of the non- 
mutators shows the “update serializability” effect; the removal of order between 
-b and +e>d (and between -d and +a<b) shows the “preserved program order” 
effect. 


Correctness. Given an execution, we can find a specification string sıs2 that 
linearizes the mutators in the dependent restriction of the execution such that 
sı contains only adds and sg contains only removes. Such a specification string 
exists because by the conditions on executions, deletes do not have any outgoing 
edges to other mutators in the dependent restriction; so, they can be moved to 
the end in the matching specification string. In order to find sı that linearizes 
the add events, any linearization that respects causality and timestamps (yielded 
by the algorithm of Attiya et al. (2016)) suffices for our purposes. The conditions 
required by SEC follow immediately. 


5 Compositional Reasoning 


The aim of this section is to establish compositional methods to reason about 
replicated data structures. We do so using Labelled Transition Systems (LTSs), 
where the transitions are labelled by dependent cuts. We show how to derive 
an LTS from an execution, Its(w). We also define an LTS for the most general 
CRDT that validates a specification, Its( X). We show that u is SEC for X exactly 
when lts(u) is a refinement of Its(’). We use this alternative characterization to 
establish composition and abstraction results. 


LTSs. An LTS is a triple consisting of a set a states, an initial state and a labelled 
transition function between states. We first define the LTSss for executions and 
specifications, then provide examples and discussion. 

For both executions and specifications, the labels of the LTS are dependent 
cuts: for executions, these are dependent cuts of the execution itself; for specifi- 
cations, they are drawn from the set Ly = U,eg cuts (v) of all possible depen- 
dent cuts. We compare LTS labels up to isomorphism, rather than identity. Thus 
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it is safe to think of LTs labels as (potentially intransitive) pomsets (Plotkin and 
Pratt 1997). 

The states of the LTS are different for the execution and specification. For 
executions, the states are cuts of the execution u itself, cuts(w); these are general 
cuts, not just dependent cuts. For specifications, the states are the stuttering 
equivalence classes of strings allowed by the specification, X/~. 

There is an isomorphism between strings and total orders. We make use of 
this in the definition, treating strings as totally-ordered Lvos. 

Define Its(u) = (cuts(u), Ø, i), where p i q if v € cuts (q) and 


p&g Emax(v) U Ep = Eg max(v) Cp 


vEq Emax(v) N Ep =ù% Emax(v) C Emax(g) 


Define lts( X) = (X/~, £, +5), where [a] =s [p] if v € Ly and 
ofp Emax(v) UE, =E, max(v) Cao 
vc P Emax(v) N Eo = 0 


We explain the definitions using examples from SET, first for executions, then 
for specifications. Consider the execution on the left below. The derived LTs is 
given on the right. 


a Tho cok: D 
-0]|+0 
O. = N ae (-ol|+0); (vol|+1) 


x en : | 
E a W” = CT 


(-0||+0); (vol|+1); vo 


The states of the LTS are cuts of the execution. The labels on transitions are 
dependent cuts. The requirements for execution transitions relate the source 
p, target q and label v. The leftmost requirements state that the target state 
must extend both the source and the label; thus the target state must be a 
combination of events and order from source and label. The middle requirements 
state that the maximal elements of the label must be new in the target; only the 
maximal elements of the label are added when moving from source to target. The 
upper right requirement states that the non-maximal order of the label must be 
respected by the source; thus the causal history reported by the label cannot 
contradict the causal history of the source. The lower right requirement ensures 
that maximal elements of the label are also maximal in the target. The restriction 
to dependent cuts explains the labels on transitions (-0]||+0) ae (-O||+0); +1 and 
(-0||+0); (Vol]+1); vo Al, (-ol|+0); (Vol|+1). By definition, there is a self- 
transition labelled with the empty LVO at every state; we elide these transitions 
in drawings. 

The specification LTS for SET is infinite, of course. To illustrate, below we give 
two sub-LTss with limitations on mutators. On the left, we only allow +0 and 
+1. On the right, we only allow +0 and -0 and only consider the case in which 
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there is at most one alternation between them. The states are shown using their 
canonical representatives. Because of the number of transitions, we show all 
dependent accessors as a single transition, with labels separated by commas. 


o 
o! 4, X0, +00 
- Q 
+0 
> 
xX 
+ +1 S 
bs +0 1 wd 
a! + c 
P p 
+0|[41 +0||+1 s qG 
+1+0 |< es >| +0+1 8 -0+0 
+ 
to, x a Y Q] o P 
0, 
al Z zi + xo +0 nS) 
+0 +0 
4, 
z 
A 9 
%, + xo, -0X0 


The requirements for specification transitions are similar to those for implemen- 
tations, but the states are equivalence classes over specification strings: with 
source [øo] and target [7]. There is a transition between the states if there are 
members of the equivalence classes, ø and 7, that satisfy the requirements. Since 
these are total orders, the leftmost requirements state that there must be lin- 
earizations of the source and label that are subsequences of the target. Similarly, 
the upper right requirement states that the non- maximal order of the label must 
be respected by the source; thus we have +0 te +0-0 but not +0 ean o, for 
any o. The use of sub-order rather than subsequence allows +0-0 =, +0-0-0 
but prevents nonsense transitions such as +0-0 Lan -0+0-0. Because the states 
are total orders, we drop the implementation LTS requirement that maximal 
events of the label must be maximal i in the target. If we were to impose this 
restriction, we would disallow -0 E +0-0. 

It is worth noting that the specification of the add-wins set removes exactly 
three edges from the right LTS: € asl +0-0, +0 re +0-0, and -0 i +0-0. 


Refinement. Refinement is a functional form of simulation (Hoare 1972; Lamport 
1983; Lynch and Vaandrager 1995). Let P = (Sp, po, œ> p) and Q = (Sa, m 
5) be LTss. A function f : Sp — Sg isa (trong refinement if p 5p r 
and f(p) = q imply that there exist w =iso v and q’ € Sg such that q Sg q' 
and f(p') = q'. Then P refines Q (notation P E Q) if there exists a refinement 
f : Sp — Sg such that the initial states are related, i.e., f(po) = qo- 

We now prove that SEC can be characterized as a refinement. We: write po => p 
Pn When pn is reachable from po via a finite sequence of steps p; => p Di+1- 


Theorem 1. u is EC for the specification X iff Its(u) G Its(X). 


Proof. For the forward direction, assume u is EC and therefore there exists a func- 
tion T : cutsy(u) — X such that VE € cutsy(u). T(E) is a linearization of E. 
For each cut p € cuts(u), we start with the dependent restriction, p | #. We further 
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restriction attention to mutators, p | # |M. The required refinement maps p to the 
equivalence class of the linearization of p|#|M chosen by 7: f(p) = [7(p | # | M)]. 
We abuse notation below by identifying each equivalence class with a canonical ele- 
ment of the class. 

We show that p =>; q implies f(p) <ops f(q). Since p C q, we deduce 
that p| #|M C q|#\|M and by monotonicity, f(p) = T(p | # |M) <obs 
t(l #1M)= f(a). 

We show that p >; q implies T(v) <obs f(q). Suppose v only contains 
mutators. Since v C q, we deduce that v C q | # | M and by monotonicity, 
T(v) <obs T(q | # | M) = f(v). On the other hand, suppose v contains the non- 
mutator a. Let A = M U {a}. Since v C q, we deduce that v| M C q| #| A. 
By monotonicity, T(v | M) <obs T(q | A). Since T(q | A) = T(q | M), we have 
T(v |M) <obs T(¢ | M) = f(q), as required. 

Thus f(p) =, f(q), completing this direction of the proof. 

For the reverse direction, we are given a refinement f : cuts(u) > X/~. For 
any p € cutsy(u), define T(p) to be a string in the equivalence class f(p) that 
includes any non-mutator found in p. 

We first prove that r(p) is a linearization of p. A simple inductive proof 
demonstrates that for any p € cutsy(u), there is a transition sequence of the 
form Ø rons; p. Thus, we deduce from the label on the final transition into p 
that the 7(p) related to p is a linearization of p. 

We now establish monotonicity. A simple inductive proof shows that for any 
p, q € cuts(w), p C q implies p +>* q. Thus T(p) <obs T(q), by the properties of 
f and the definition of rT. 


Composition. Given two non-interacting data structures whose replicated imple- 
mentations satisfy their sequential specifications, the implementation that com- 
bines them satisfies the interleaving of their specifications. We formalize this as 
a composition theorem in the style of Herlihy and Wing (1990). 

Given an execution u and L C L, write u | L for the execution that results 
by restricting u to events with labels in L: u | L = u | {e € E, | Au (e) € L}. This 
notation lifts to sets in the standard way: U | L = U ey {u | L}. Write u Esec X 
to indicate that u is SEC for X. 


Theorem 2 (Composition). Let Lı and Lo be mutually independent subsets 
of L. For i € {1,2}, let X; be a specification with labels chosen from Li, such 
that 3) ||| X2 is also a specification. If (U | L1) Fsec 71 and (U | La) Feec X2 then 
U Esec (X1 ||| X2) (equivalently Its( X1 ||| X2) = Its(%1) ||| Its(¥2)). 


The proof is immediate. Since Lı and Lə are mutually independent, any inter- 
leaving of the labels will satisfy the definition. 


Abstraction. We describe a process algebra with parallel composition and restric- 
tion and establish congruence results. We ignore syntactic details and work 
directly with LTSs. Replica identities do not play a role in the definition; thus, 
we permit implicit mobility of the client amongst replicas with the only con- 
straint being that the replica has at least as much history on the current item 
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of interaction as the client. This constraint is enforced by the synchronization of 
the labels, defined below. While the definition includes the case where the client 
itself is replicated, it does not provide for out-of-band interaction between the 
clients at different replicas: All interaction is assumed to happen through the 
data structure. 

The relation || is defined between LTss so that P || Q describes the system 
that results when client P interacts with data structure Q. For LTss P and Q, 
define +>, inductively, as follows, where Ý represents the empty LVO. 


v $ bi 1 w 1 
qo 4 ppp qed 
U 
(p, a) x (p, d) (p, a) >x (P, 7) 


Jv’ =iso v. v’ C w and max(v’) = max(w) 


Let Sx = {(p, q) | I(p', q’). (p, q) >) (p', q') and Av, p". p! =p p} 


P[Q= {(Sx, (Po, qo), x) } if Sx is non-empty 
0 otherwise 


The || operator is asymmetric between the client and data structure in two ways. 
First, note that every action of the client must be matched by the data structure. 
The condition of client quiescence in the definition of Sx, that all of the actions 
of the client P must be matched by Q; otherwise P || Q = Ø. However, the 
first rule for ++, explicitly permits actions of the data structure that may not 
be matched by the client. This asymmetry permits the composition of the data 
structure with multiple clients to be described incrementally, one client at a 
time. Thus, we expect that (P; | P2) | Q = Py || (P2 IQ). 

Second, note that right rule for +>, interaction permits the data structure Q 
to introduce order not found in the clients. This is clearly necessary to ensure that 
that the composition of client /O|+0 with the SET data structure is nonempty. In 
this case, the client has no order between +0 and vO whereas the data structure 
orders vO after +0. In this paper, we do not permit the client to introduce 
order that is not seen in the data structure. For a discussion of this issue, see 
(Jagadeesan and Riely 2015). 

We can also define restriction for some set A C L of labels, a la CCS. P\A = 
(Sp, po, {(p, v, q) | (p, v, q) € (Hp) and labels(v) N A = 0}). The definitions 
lift to sets: P | Q =Upep goco P || Q and P\A = {(P\A) | P E P}. 


Lemma 3. If P 


EP’ andQOL Q thenP| QGP || QU and P\ALP\A. 


az ae ~ ~N 


It suffices to show that: P C Its(w) implies P || Its(u) E P || Its(X). The proof 
proceeds in the traditional style of such proofs in process algebra. We illustrate 
by sketching the case for client parallel composition. Let f be the witness for 


P C lts(u). The proof proceeds by constructing a “product” refinement S relation 


~N 


of the identity on the states of P with f, i.e.: f(q) = q implies (p, q) S (p, q’). 
Thus, an SEC implementation can be replaced by the specification. 


Theorem 4 (Abstraction). [fu is SEC for X, then P || Its(u) E P || Its(X). 
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6 A Replicated Graph Algorithm 


We describe a graph implemented with sets for vertices and edges, as specified 
by Shapiro et al. (201la). The graph maintains the invariant that the vertices 
of an edge are also part of the graph. Thus, an edge may be added only if 
the corresponding vertices exist; conversely, a vertex may be removed only if 
it supports no edge. In the case of a concurrent addition of an edge with the 
deletion of either of its vertices, the deletion takes precedence. 

The vertices v,w,... are drawn from some universe U. An edge e,e’,... is 
a pair of vertices. Let vert(e) = {v, w} be the vertices of edge e = (v, w). The 
vocabulary of the set specification includes mutators for the addition and removal 
of vertices and edges and non-mutators for membership tests. 


M = {+v, -v,+(v,w),-(v,w) | v,w EU} 
M = {Wv, Xv, (w, w), X (v, w) | v,w EU} 
# = {le v), (v,e) | v € vert(e)} U {(e, e’) | vert(e) nvert(e!) # 0} 


Valid graph specification strings answer queries like sets. In addition, we require 
the following. 


— Vertices and edges added at most once: Each add label is unique. 
— Removal of a vertex or edge is preceded by a corresponding add. 
— Vertices are added before they are mentioned in any edga If of = +(v,w), 


or oJ = -(v,w) there exists i, i! < j such that: ot = +v, o? = +w. 
— Vertices are removed only after they are mentioned in edges: If of = +(v,w), 
or o/ = -(v,w), then for all i < j: ot 4 -v and ot S$ -w. 


Graph Implementation. We rewrite the graph program of Shapiro et al. (2011a) 
in a more abstract form. Our distributed graph implementation is written as a 
client of two replicate set: for vertices (V) and for edges (E). The implementation 
uses USETs, which require that an element be added at most once and that 
each remove causally follow the corresponding add. Here we show the graph 
implementation for various methods as client code that runs at each replica. At 
each replica, the code accesses its local copy of the USETs. All the message passing 
needed to propagate the updates is handled by the USET implementations of the 
sets V, E. For several methods, we list preconditions, which prescribe the natural 
assumptions that need to satisfied when these client methods are invoked. For 
example, an edge operation requires the presence of the vertices at the current 
replica. 


addVertex(v) removeVertex(v) bool ?(v) 
Pre: fresh(v) Pre: V.?(v) return V.?(v) 
V.add(v) V.remove(v) 
addEdge(v,w) removeEdge(v,w) bool ?(v,w) 
Pre: V.?(v) ,V?(w) Pre: V.?(v) ,V?(w) if V.?(v) 
Pre: fresh((v,w)) Pre: E.?((v,w)) then return E.?((v,w)) 


E.add((v,w)) E.remove((v,w)) else return false 
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We assume a causal transition system (as needed in Shapiro et al. (2011a)). 


Correctness Using the Set Specification. We first show the correctness of the 
graph algorithm, using the SET specification for the vertex and edge sets. We 
then apply the abstraction and composition theorems to show the correctness of 
the algorithm using a set implementation. 

Let u be a LVO generated in an execution of the graph implementation. The 
preconditions ensure that u has the following properties: 


(a) For any v, +v is never ordered after -v, and likewise for e. 
(b) -(w,w) or +(v,w) is never ordered after -v or -w. 
(c) -(,w) or +(v,w) is always ordered after some +v and +w. 


Define c1, a2 and g3 as follows. 


— All elements of o; are of the form +v. 01 exists by (c) above. 

— All elements of o3 are of the form -v. 03 exists by (b) above. 

— For each edge (v, w) that is accessed in u, let o(y,y) be any interleaving of the 
events involving (v, w) in u such that no +(v,w) occurs after any -(v,w) in 
Tlw w): Oww) exists by (a) above. oz is any interleaving of all the s(y,w)- 


Then u is SEC with witness o,, = 010203. 


Full Correctness of the Implementation. We now turn to proving the correctness 
of the algorithm when the two sets are replaced by their implementations. 
Consider two (distributed implementations of) separate and independent sets 
for vertices and edges, i.e. Ly, Ly», = 0. Suppose we have two implementations, 
each of which is correct individually: Its(U;) E Its(X;). By composition, we have 


that they are correct when composed together: U1 ||| U2 E LX ||| X2. Let P be 
the graph implementation, which is a client of the two sets. By abstraction, we 
know that P || (24 ||| X2) & T implies P || (Ui ||| U2) E T. By congruence, we 


T N 
deduce: 


(P | (%1 I| 22) \Œ», ULs,) 5 T implies (P || (V1 ||| U2))\(hs, U L») & T. 


~ 


Thus, in order to validate the full graph implementation, it is sufficient to estab- 
lish the correctness of the graph client when interacting with the specification of 
the two independent SETs for edges and vertices, which we have already done in 
the previous treatment of abstract correctness. 


7 Conclusions 


We have provided a definition of strong eventual consistency that captures valid- 
ity with respect to a sequential specification. Our definition reflects an attempt 
to resolve the tension between expressivity (cover the extant examples in the 
literature) and facilitating reasoning (by retaining a direct relationship with the 
sequential specification). The notion of concurrent specification developed by 
Burckhardt et al. (2014) has been used to prove the validity of several replicated 
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data structure implementations. In future work, we would like to discover suffi- 
cient conditions relating concurrent and sequential specifications such that any 
implementation that is correct under the concurrent specification (as defined by 
Burckhardt et al. (2014)) will also be correct under the sequential counterpart 
(as defined here). 
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Abstract. Many theorem provers can generate functional programs 
from definitions or proofs. However, this code generation needs to be 
trusted. Except for the HOL4 system, which has a proof producing 
code generator for a subset of ML. We go one step further and provide 
a verified compiler from Isabelle/HOL to CakeML. More precisely we 
combine a simple proof producing translation of recursion equations in 
Isabelle/HOL into a deeply embedded term language with a fully verified 
compilation chain to the target language CakeML. 
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1 Introduction 


Many theorem provers have the ability to generate executable code in some (typ- 
ically functional) programming language from definitions, lemmas and proofs 
(e.g. [6,8,9, 12,16,27,37]). This makes code generation part of the trusted kernel 
of the system. Myreen and Owens [30] closed this gap for the HOL4 system: they 
have implemented a tool that translates from HOL4 into CakeML, a subset of 
SML, and proves a theorem stating that a result produced by the CakeML code 
is correct w.r.t. the HOL functions. They also have a verified implementation of 
CakeML [24,40]. We go one step further and provide a once-and-for-all verified 
compiler from (deeply embedded) function definitions in Isabelle/HOL [32,33] 
into CakeML proving partial correctness of the generated CakeML code w.r.t. 
the original functions. This is like the step from dynamic to static type checking. 
It also means that preconditions on the input to the compiler are explicitly given 
in the correctness theorem rather than implicitly by a failing translation. To the 
best of our knowledge this is the first verified (as opposed to certifying) compiler 
from function definitions in a logic into a programming language. 

Our compiler is composed of multiple phases and in principle applicable to 
other languages than Isabelle/HOL or even HOL: 


© The Author(s) 2018 
A. Ahmed (Ed.): ESOP 2018, LNCS 10801, pp. 999-1026, 2018. 
https: //doi.org/10.1007/978-3-319-89884-1_35 


1000 L. Hupel and T. Nipkow 


— We erase types right away. Hence the type system of the source language is 
irrelevant. 

— We merely assume that the source language has a semantics based on equa- 
tional logic. 


The compiler operates in three stages: 


1. The preprocessing phase eliminates features that are not supported by our 
compiler. Most importantly, dictionary construction eliminates occurrences 
of type classes in HOL terms. It introduces dictionary datatypes and new 
constants and proves the equivalence of old and new constants (Sect. 7). 

2. The deep embedding lifts HOL terms into terms of type term, a HOL model 
of HOL terms. For each constant c (of arbitrary type) it defines a constant c’ 
of type term and proves a theorem that expresses equivalence (Sect. 3). 

3. There are multiple compiler phases that eliminate certain constructs from 
the term type, until we arrive at the CakeML expression type. Most phases 
target a different intermediate term type (Sect. 5). 


The first two stages are preprocessing, are implemented in ML and produce 
certificate theorems. Only these stages are specific to Isabelle. The third (and 
main) stage is implemented completely in the logic HOL, without recourse to 
ML. Its correctness is verified once and for all. 


2 Related Work 


There is existing work in the Coq [2,15] and HOL [30] communities for proof 
producing or verified extraction of functions defined in the logic. Anand et al. [2] 
present work in progress on a verified compiler from Gallina (Coq’s specification 
language) via untyped intermediate languages to CompCert C light. They plan 
to connect their extraction routine to the CompCert compiler [26]. 

Translation of type classes into dictionaries is an important feature of Haskell 
compilers. In the setting of Isabelle/HOL, this has been described by Wenzel 
[44] and Krauss et al. [23]. Haftmann and Nipkow [17] use this construction to 
compile HOL definitions into target languages that do not support type classes, 
e.g. Standard ML and OCaml. In this work, we provide a certifying translation 
that eliminates type classes inside the logic. 

Compilation of pattern matching is well understood in literature [3, 36,38]. 
In this work, we contribute a transformation of sets of equations with pattern 
matching on the left-hand side into a single equation with nested pattern match- 
ing on the right-hand side. This is implemented and verified inside Isabelle. 

Besides CakeML, there are many projects for verified compilers for functional 
programming languages of various degrees of sophistication and realism (e.g. 


All Isabelle definitions and proofs can be found on the paper website: https:// 
lars.hupel.info/research/codegen/, or archived as https://doi.org/10.5281/zenodo. 
1167616. 
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[4,11,14]). Particularly modular is the work by Neis et al. [31] on a verified 
compiler for an ML-like imperative source language. The main distinguishing 
feature of our work is that we start from a set of higher-order recursion equations 
with pattern matching on the left-hand side rather than a lambda calculus with 
pattern matching on the right-hand side. On the other hand we stand on the 
shoulders of CakeML which allows us to bypass all complications of machine 
code generation. Note that much of our compiler is not specific to CakeML and 
that it would be possible to retarget it to, for example, Pilsner abstract syntax 
with moderate effort. 

Finally, Fallenstein and Kumar [13] have presented a model of HOL inside 
HOL using large cardinals, including a reflection proof principle. 


3 Deep Embedding 


Starting with a HOL definition, we derive a new, reified definition in a deeply 
embedded term language depicted in Fig. 1a. This term language corresponds 
closely to the term datatype of Isabelle’s implementation (using de Bruijn indices 
[10]), but without types and schematic variables. 

To establish a formal connection between the original and the reified defini- 
tions, we use a logical relation, a concept that is well-understood in literature 
[20] and can be nicely implemented in Isabelle using type classes. Note that the 
use of type classes here is restricted to correctness proofs; it is not required for 
the execution of the compiler itself. That way, there is no contradiction to the 
elimination of type classes occurring in a previous stage. 


Notation. We abbreviate App t u to t $ u and Abs t to A t. Other term types 
introduced later in this paper use the same conventions. We reserve for abstrac- 
tions in HOL itself. Typing judgments are written with a double colon: t::T. 


Embedding Operation. Embedding is implemented in ML. We denote this oper- 
ation using angle brackets: (t), where t is an arbitrary HOL expression and the 
result (t) is a HOL value of type term. It is a purely syntactic transformation, 
without preliminary evaluation or reduction, and it discards type information. 
The following examples illustrate this operation and typographical conventions 
concerning variables and constants: 


(x) = Free "x" (f) = Const "f" (Ax. f x) = A ((f) $ Bound 0) 


Small-Step Semantics. Figure 1b specifies the small-step semantics for term. It is 
reminiscent of higher-order term rewriting, and modelled closely after equality in 
HOL. The basic idea is that if the proposition t = u can be proved equationally 
in HOL (without symmetry), then R F (t) —>* (u) holds (where R:: (term x 
term) set). We call R the rule set. It is the result of translating a set of defining 
equations lhs = rhs into pairs ((lhs) , (rhs)) € R. 
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datatype term = s (lhs, rhs) € R match lhs t = Some o 
Const string | ees Rt t — subst o rhs 
Rages closed t’ REt—?# 
Abs term | BETA Fu 


Bound nat | Rt (At) $t — tit] “REtSu—? Su 


App term term Po ae 


G 
ReEt$u—t$u 


AR 


ts) Pare aes of (b) Small-step semantics 


Fig. 1. Basic syntax and semantics of the term type 


Rule STEP performs a rewrite step by picking a rewrite rule from R and 
rewriting the term at the root. For that purpose, match and subst are (mostly) 
standard first-order matching and substitution (see Sect. 4 for details). 

Rule BETA performs (-reduction. Type term represents bound variables by 
de Bruijn indices. The notation t|t’] represents the substitution of the outermost 
bound variable in t with t. 

Our semantics does not constitute a fully-general higher-order term rewrit- 
ing system, because we do not allow substitution under binders. For de Bruijn 
terms, this would pose no problem, but as soon as we introduce named bound 
variables, substitution under binders requires dealing with capture. To avoid this 
altogether, all our semantics expect terms that are substituted into abstractions 
to be closed. However, this does not mean that we restrict ourselves to any par- 
ticular evaluation order. Both call-by-value and call-by-name can be used in the 
small-step semantics. But later on, the target semantics will only use call-by- 
value. 


Embedding Relation. We denote the concept that an embedded term t corre- 
sponds to a HOL term a of type T w.r.t. rule set R with the syntax RF ta. 
If we want to be explicit about the type, we index the relation: ~+. 

For ground types, this can be defined easily. For example, the following two 
rules define nat: 


R H (t) X nat N 
RE (0) nat 0 Re (Suc t) nat Suc n 


Definitions of ~ for arbitrary datatypes without nested recursion can be derived 
mechanically in the same fashion as for nat, where they constitute one-to- 
one relations. Note that for ground types, ~ ignores R. The reason why ~% is 
parametrized on R will become clear in a moment. 

For function types, we follow Myreen and Owen’s approach [30]. The state- 
ment R H t ~ f can be interpreted as “t $ (a) can be rewritten to (f a) for 
all a”. Because this might involve applying a function definition from R, the = 
relation must be indexed by the rule set. As a notational convenience, we define 
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another relation Rt t | x to mean that there is a t’ such that Rt t —>* t' and 
Rt’ ~ gx. Using this notation, we formally define ~ for functions as follows: 


Rrtsfe(WurReul«oRtt$ul faz) 


Example. As a running example, we will use the map function on lists: 


map f |] =| 
map f (x # xs) = f x # map f zs 


The result of embedding this function is a set of rules map’: 


map’ = 
{(Const ” List.list.map” $ Free "f’ $ (Const ” List.list.Cons” $ Free "x21" $ Free "x22" ), 
Const ” List.list.Cons” $ (Free "f’ $ Free ”x21") $...), 
(Const ” List.list.map” $ Free "f’ $ Const ” List.list.Nil”, 
Const ” List.list.Nil” )} 


together with the theorem map’ H Const "List.list.map" | map, which is 
proven by simple induction over map. Constant names like "List.list.map" 
come from the fully-qualified internal names in HOL. 

The induction principle for the proof arises from the use of the fun command 
that is used to define recursive functions in HOL [22]. But the user is also allowed 
to specify custom equations for functions, in which case we will use heuristics 
to generate and prove the appropriate induction theorem. For simplicity, we 
will use the term (defining) equation uniformly to refer to any set of equations, 
either default ones or ones specified by the user. Embedding partially-specified 
functions — in particular, proving the certificate theorem about them — is cur- 
rently not supported. In the future, we plan to leverage the domain predicate as 
produced by fun to generate conditional theorems. 


4 Terms, Matching and Substitution 


The compiler transforms the initial term type (Fig. la) through various inter- 
mediate stages. This section gives an overview and introduces necessary 
terminology. 


Preliminaries. The function arrow in HOL is =. The cons operator on lists is 
the infix #. 

Throughout the paper, the concept of mappings is pervasive: We use the 
type notation a — ĝ to denote a function a = £ option. In certain contexts, 
a mapping may also be called an environment. We write mapping literals using 
brackets: [a > x,b => y,...]. If it is clear from the context that ø is defined on 
a, we often treat the lookup o a as returning an x:: 8. 

The functions dom::(a@ — 3) = a set and range::(a — 8) => £ set return 
the domain and range of a mapping, respectively. 
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Dropping entries from a mapping is denoted by ø — k, where ø is a mapping 
and k is either a single key or a set of keys. We use o’ C o to denote that o’ is 
a sub-mapping of a, that is, dom o’ C domo and Va € domo’. o'a =0 a. 

Merging two mappings o and p is denoted with o + p. It constructs a new 
mapping with the union domain of o and p. Entries from p override entries from 
o. That is, p C o + p holds, but not necessarily o C o + p. 

All mappings and sets are assumed to be finite. In the formalization, this is 
enforced by using subtypes of — and set. Note that one cannot define datatypes 
by recursion through sets for cardinality reasons. However, for finite sets, it 
is possible. This is required to construct the various term types. We leverage 
facilities of Blanchette et al.’s datatype command to define these subtypes [7]. 


Standard Functions. All type constructors that we use (—, set, list, option, ...) 
support the standard operations map and rel. For lists, map is the regular covariant 
map. For mappings, the function has the type (3 > 7) > (a = b) > (a= 7). 
It leaves the domain unchanged, but applies a function to the range of the 
mapping. 

Function rel, lifts a binary predicate P :: a = a = bool to the type construc- 
tor T. We call this lifted relation the relator for a particular type. 

For datatypes, its definition is structural, for example: 


relist P xs ys Pay 
relist P [| |] relist P (x # xs) (y # ys) 


For sets and mappings, the definition is a little bit more subtle. 


Definition 1 (Set relator). For each element a € A, there must be a corre- 
sponding element b E€ B such that P ab, and vice versa. Formally: 


rele PAB O (Vx € A. Jy € B. P x y) A (Yy E€ B.Are A. Poy) 


Definition 2 (Mapping relator). For each a, m a and n a must be related 
according to reloption P. Formally: 


relmappings P M n <> (Va. reloption P (M a) (n a)) 


Term Types. There are four distinct term types: term, nterm, pterm, and sterm. 
All of them support the notions of free variables, matching and substitution. Free 
variables are always a finite set of strings. Matching a term against a pattern 
yields an optional mapping of type string — a from free variable names to terms. 
Note that the type of patterns is itself term instead of a dedicated pattern 
type. The reason is that we have to subject patterns to a linearity constraint 
anyway and may use this constraint to carve out the relevant subset of terms: 


Definition 3. A term is linear if there is at most one occurrence of any variable, 
it contains no abstractions, and in an application f $x, f must not be a free 
variable. The HOL predicate is called linear :: term = bool. 
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Because of the similarity of operations across the term types, they are all 
instances of the term type class. Note that in Isabelle, classes and types live 
in different namespaces. The term type and the term type class are separate 
entities. 


Definition 4. A term type T supports the operations match::term => T => 
(string — T), subst:: (string — T) > T => T and frees::r => string set. We 
also define the following derived functions: 


— matchs matches a list of patterns and terms sequentially, producing a single 
mapping 

— closed t is an abbreviation for frees t = 

— closed ø is an overloading of closed, denoting that all values in a mapping are 
closed 


Additionally, some (obvious) axioms have to be satisfied. We do not strive to 
fully specify an abstract term algebra. Instead, the axioms are chosen according 
to the needs of this formalization. 

A notable deviation from matching as discussed in term rewriting literature 
is that the result of matching is only well-defined if the pattern is linear. 


Definition 5. An equation is a pair of a pattern (left-hand side) and a term 
(right-hand side). The pattern is of the form f$pi$...$pn, where f is a constant 
(i.e. of the form Const name). We refer to both f or name interchangeably as 
the function symbol of the equation. 


Following term rewriting terminology, we sometimes refer to an equation as rule. 


4.1 De Bruijn terms (term) 


The definition of term is almost an exact copy of Isabelle’s internal term type, 
with the notable omissions of type information and schematic variables (Fig. 1a). 
The implementation of 3-reduction is straightforward via index shifting of bound 
variables. 


4.2 Named Bound Variables (nterm) 


datatype nterm = Neonst string | Nvar string | Nabs string nterm | Napp nterm nterm 


The nterm type is similar to term, but removes the distinction between bound 
and free variables. Instead, there are only named variables. As mentioned in the 
previous section, we forbid substitution of terms that are not closed in order 
to avoid capture. This is also reflected in the syntactic side conditions of the 
correctness proofs (Sect. 5.1). 
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4.3 Explicit Pattern Matching (pterm) 


datatype pterm = 
Pconst string | Pvar string | Pabs ((term x pterm) set) | Papp pterm pterm 


Functions in HOL are usually defined using implicit pattern matching, that is, 
the terms p; occurring on the left-hand side (f pı ... Pn) of an equation must 
be constructor patterns. This is also common among functional programming 
languages like Haskell or OCaml. CakeML only supports explicit pattern match- 
ing using case expressions. A function definition consisting of multiple defining 
equations must hence be translated to the form f = Ax. case x of .... The 
elimination proceeds by iteratively removing the last parameter in the block of 
equations until none are left. 

In our formalization, we opted to combine the notion of abstraction and case 
expression, yielding case abstractions, represented as the Pabs constructor. This 
is similar to the fn construct in Standard ML, which denotes an anonymous 
function that immediately matches on its argument [28]. The same construct 
also exists in Haskell with the LambdaCase language extension. We chose this 
representation mainly for two reasons: First, it allows for a simpler language 
grammar because there is only one (shared) constructor for abstraction and case 
expression. Second, the elimination procedure outlined above does not have to 
introduce fresh names in the process. Later, when translating to CakeML syntax, 
fresh names are introduced and proved correct in a separate step. 

The set of pairs of pattern and right-hand side inside a case abstraction is 
referred to as clauses. As a short-hand notation, we use A{p; => t1, p2 => te,...}. 


4.4 Sequential Clauses (sterm) 


datatype sterm = 
Sconst string | Svar string | Sabs ((term x sterm) list) | Sapp sterm sterm 


In the term rewriting fragment of HOL, the order of rules is not significant. If a 
rule matches, it can be applied, regardless when it was defined or proven. This 
is reflected by the use of sets in the rule and term types. For CakeML, the rules 
need to be applied in a deterministic order, i.e. sequentially. The sterm type only 
differs from pterm by using list instead of set. Hence, case abstractions use list 
brackets: Alpi > t1,p2 => te,.. J 


4.5 Irreducible Terms (value) 


CakeML distinguishes between expressions and values. Whereas expressions may 
contain free variables or @-redexes, values are closed and fully evaluated. Both 
have a notion of abstraction, but values differ from expressions in that they 
contain an environment binding free variables. 

Consider the expression (Ax. Ay.x) (Az.z), which is rewritten (by 6-reduction) 
to Ay.Az.z. Note how the bound variable x disappears, since it is replaced. This 
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is contrary to how programming languages are usually implemented: evaluation 
does not happen by substituting the argument term t for the bound variable 
x, but by recording the binding x + t in an environment [24]. A pair of an 
abstraction and an environment is usually called a closure [25,41]. 
In CakeML, this means that evaluation of the above expression results in the 
closure 
(Ayx, ["x" > (Azz, J) 


Note the nested structure of the closure, whose environment itself contains a 
closure. 
To reflect this in our formalization, we introduce a type value of values (expla- 

nation inline): 
datatype value = 

(* constructor value: a data constructor applied to multiple values x) 

Vconstr string (value list) | 

(x closure: clauses combined with an environment mapping variables to values *) 

Vabs ((term x sterm) list) (string — value) | 

(* recursive closures: a group of mutually recursive function bodies with an environment *) 

Vrecabs (string — ((term x sterm) list)) string (string — value) 


The above example evaluates to the closure: 
Vabs | (y) => (x) | ["x" + Vabs [(z) = (z)] [l] 


The third case for recursive closures only becomes relevant when we conflate 
variables and constants. As long as the rule set rs is kept separate, recursive calls 
are straightforward: the appropriate definition for the constant can be looked up 
there. CakeML knows no such distinction between constants and variables, hence 
everything has to reside in a single environment ø. 

Consider this example of odd and even: 


odd 0 = False even 0 = True 


odd (Suc n) = even n even (Suc n) = odd n 


When evaluating the term odd k, the definitions of even and odd themselves 
must be available in the environment captured in the definition of odd. However, 
it would be cumbersome in HOL to construct such a Vabs that refers to itself. 
Instead, we capture the expressions used to define odd and even in a recursive 
closure. Other encodings might be possible, but since we are targeting CakeML, 
we are opting to model it in a similar way as its authors do. 

For the above example, this would result in the following global environment: 


["odd" ++ Vrecabs css "odd" [], "even" +> Vrecabs css "even" |]| 


where css = ["odd" +> [(0) = (False) , (Suc n) = (even n)], 
"even" +> [(0) = (True) , (Suc n) = (odd n)]] 
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Note that in the first line, the right-hand sides are values, but in css, they 
are expressions. The additional string argument of Vrecabs denotes the selected 
function. When evaluating an application of a recursive closure to an argument 
(G-reduction), the semantics adds all constituent functions of the closure to the 
environment used for recursive evaluation. 


5 Intermediate Semantics and Compiler Phases 


In this section, we will discuss the progression from de Bruijn based term lan- 
guage with its small-step semantics given in Fig. 1a to the final CakeML seman- 
tics. The compiler starts out with terms of type term and applies multiple 
phases to eliminate features that are not present in the CakeML source language. 


Phase/Refinement Types & Semantics 
de Bruit constructors :: string set (shared by all phases) 
A ieee R :: (term x term) set, t, t’ :: term 
erns R+ t — ť' (Figure 1b) 
s| Theorem 1 
Named bound R :: (term x nterm) set, t,t’ :: nterm 
variables RtH t — t (Figure 3) 
s| Í see §5.3 
Explicit pattern R :: (string x pterm) set, t, t’ :: pterm 
matching Rt t — t (Figure 4) 
54| see §5.4 
Sequential rs :: (string x sterm) list, t, t’ :: sterm 
clauses rs t — t' (Figure 5) 
; : Theorem 2 
; §5.5 rs :: (string x sterm) list, ø :: string — sterm 
a t, u :: sterm 
§5.6 : : > 
; rs,o H t | u (Figure 6) 
l : Theorem 1 
_ rs :: (string x value) list, ø :: string — value 
Evaluation k 2 
Ae t :: sterm, u :: value 
Scenes rs,o F t | u (Figure 7) 
: Theorem 4 
§5.7 o :: string — value 
t :: sterm, u :: value 
of t | u (Figure 8) 
— compiler phase; +--+ > semantics refinement 


semantics belonging to the phase; s= semantics relation 


Fig. 2. Intermediate semantics and compiler phases 
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Types term, nterm and pterm each have a small-step semantics only. Type sterm 
has a small-step and several intermediate big-step semantics that bridge the gap 
to CakeML. An overview of the intermediate semantics and compiler phases is 
depicted in Fig. 2. The left-hand column gives an overview of the different phases. 
The right-hand column gives the types of the rule set and the semantics for each 
phase; you may want to skip it upon first reading. 


(lhs, rhs) € R match lhs t = Some o closed t’ 
BETA 


STEP 
Rt t — subst o rhs Re (Az. t) $t — subst [z = t'] t 


Fig. 3. Small-step semantics for nterm with named bound variables 


5.1 Side Conditions 


All of the following semantics require some side conditions on the rule set. These 
conditions are purely syntactic. As an example we list the conditions for the 
correctness of the first compiler phase: 


— Patterns must be linear, and constructors in patterns must be fully applied. 

— Definitions must have at least one parameter on the left-hand side (Sect. 5.6). 

— The right-hand side of an equation refers only to free variables occurring in 
patterns on the left-hand side and contain no dangling de Bruijn indices. 

— There are no two defining equations lhs = rhs; and lhs = rhs2 such that 
rhs, Æ rhsə. 

— For each pair of equations that define the same constant, their arity must be 
equal and their patterns must be compatible (Sect. 5.3). 

— There is at least one equation. 

— Variable names occurring in patterns must not overlap with constant names 
(Sect. 5.7). 

— Any occurring constants must either be defined by an equation or be a con- 
structor. 


The conditions for the subsequent phases are sufficiently similar that we do not 
list them again. 

In the formalization, we use named contexts to fix the rules and assump- 
tions on them (locales in Isabelle terminology). Each phase has its own locale, 
together with a proof that after compilation, the preconditions of the next phase 
are satisfied. Correctness proofs assume the above conditions on R and similar 
conditions on the term that is reduced. For brevity, this is usually omitted in 
our presentation. 
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5.2 Naming Bound Variables: From term to nterm 


Isabelle uses de Bruijn indices in the term language for the following two rea- 
sons: For substitution, there is no need to rename bound variables. Additionally, 
a-equivalent terms are equal. In implementations of programming languages, 
these advantages are not required: Typically, substitutions do not happen inside 
abstractions, and there is no notion of equality of functions. Therefore CakeML 
uses named variables and in this compilation step, we get rid of de Bruijn indices. 

The “named” semantics is based on the nterm type. The rules that are 
changed from the original semantics (Fig. 1b) are given in Fig. 3 (FUN and ARG 
remain unchanged). Notably, 3-reduction reuses the substitution function. 

For the correctness proof, we need to establish a correspondence between 
terms and nterms. Translation from nterm to term is trivial: Replace bound 
variables by the number of abstractions between occurrence and where they 
were bound in, and keep free variables as they are. This function is called 
nterm_to_term. 

The other direction is not unique and requires introduction of fresh names 
for bound variables. In our formalization, we have chosen to use a monad to 
produce these names. This function is called term_to_nterm. We can also prove 
the obvious property nterm_to_term (term_to_nterm t) = t, where ¢ is a term 
without dangling de Bruijn indices. 

Generation of fresh names in general can be thought of as picking a string 
that is not an element of a (finite) set of already existing names. For Isabelle, 
the Nominal framework [42,43] provides support for reasoning over fresh names, 
but unfortunately, its definitions are not executable. 

Instead, we chose to model generation of fresh names as a monad a fresh 
with the following primitive operations in addition to the monad operations: 


run:: @ fresh > string set > a 


fresh_name:: string fresh 


In our implementation, we have chosen to represent a fresh as roughly isomorphic 
to the state monad. 
Compilation of a rule set proceeds by translation of the right-hand side of all 
rules: 
compile R = {(p, term_to_nterm t) | (p, t) € R} 


The left-hand side is left unchanged for two reasons: function match expects an 
argument of type term (see Sect. 4), and patterns do not contain abstractions or 
bound variables. 


Theorem 1 (Correctness of compilation). Assuming a step can be taken 
with the compiled rule set, it can be reproduced with the original rule set. 


compile Rt t — u closed t 


Rt nterm_to_term t —> nterm_to_term u 


We prove this by induction over the semantics (Fig. 3). 
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i (pat, rhs) € C match pat t = Some a closed t 
RE (A C) $t — subst o rhs 


(name, rhs) € R 


BET. 


STEP’ 
Rt Pconst name — rhs 


Fig. 4. Small-step semantics for pterm with pattern matching 


5.3 Explicit Pattern Matching: From nterm to pterm 


Usually, functions in HOL are defined using implicit pattern matching, that is, 
the left-hand side of an equation is of the form (f pı ... Pn), where the p; are 
patterns over datatype constructors. For any given function f, there may be 
multiple such equations. In this compilation step, we transform sets of equations 
for f defined using implicit pattern matching into a single equation for f of the 
form (f) = A C, where C'is a set of clauses. 

The strategy we employ currently requires successive elimination of a single 
parameter from right to left, in a similar fashion as Slind’s pattern matching 
compiler [38, Sect. 3.3.1]. Recall our running example (map). It has arity 2. We 
omit the brackets () for brevity. First, the list parameter gets eliminated: 


map f =|] = |] 
| x # xs => fx#map f as 
Finally, the function parameter gets eliminated: 
map=à f> (Al=>[ 
| x # xs => f x # map f zs) 


This has now arity 0 and is defined by a twice-nested abstraction. 


Semantics. The target semantics is given in Fig. 4 (the FUN and ARG rules 
from previous semantics remain unchanged). We start out with a rule set R that 
allows only implicit pattern matching. After elimination, only explicit pattern 
matching remains. The modified STEP rule merely replaces a constant by its 
definition, without taking arguments into account. 


Restrictions. For the transformation to work, we need a strong assumption 
about the structure of the patterns p; to avoid the following situation: 


map f |] =| 
map g (x # xs) = g x # map g zs 


Through elimination, this would turn into: 
map =A f = (A[] = []) 
|g => (à £ # xs => f x # map f zs) 
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(name, rhs) € R first-match cs t = Some (a, rhs) closed t 
STEP BETA 
Rt Sconst name — rhs RE (A cs) $ t — subst o rhs 


Fig. 5. Small-step semantics for sterm 


Even though the original equations were non-overlapping, we suddenly 
obtained an abstraction with two overlapping patterns. Slind observed a similar 
problem [38, Sect. 3.3.2] in his algorithm. Therefore, he only permits uniform 
equations, as defined by Wadler [36, Sect. 5.5]. Here, we can give a formal char- 
acterization of our requirements as a computable function on pairs of patterns: 


fun pat_compat :: term = term => bool where 
pat_compat (tı $ t2) (ui $ u2) => pat_compat tı u1 A (tı = ui — pat_compat tz u2) 
pat_compat t u <> (overlapping t u > t = u) 


This compatibility constraint ensures that any two overlapping patterns (of the 
same column) p; and p;,, are equal and are thus appropriately grouped together 
in the elimination procedure. We require all defining equations of a constant to be 
mutually compatible. Equations violating this constraint will be flagged during 
embedding (Sect. 3), whereas the pattern elimination algorithm always succeeds. 

While this rules out some theoretically possible pattern combinations (e.g. 
the diagonal function [36, Sect. 5.5]), in practice, we have not found this to be a 
problem: All of the function definitions we have tried (Sect. 8) satisfied pattern 
compatibility (after automatic renaming of pattern variables). As a last resort, 
the user can manually instantiate function equations. Although this will always 
lead to a pattern compatible definition, it is not done automatically, due to the 
potential blow-up. 


Discussion. Because this compilation phase is both non-trivial and has some 
minor restrictions on the set of function definitions that can be processed, we 
may provide an alternative implementation in the future. Instead of eliminat- 
ing patterns from right to left, patterns may be grouped in tuples. The above 
example would be translated into: 


map = A (f,[]) > [] 
|(f,a#as) > f x # map f zs 


We would then leave the compilation of patterns for the CakeML compiler, which 
has no pattern compatibility restriction. 

The obvious disadvantage however is that this would require the knowledge 
of a tuple type in the term language which is otherwise unaware of concrete 
datatypes. 


5.4 Sequentialization: From pterm to sterm 


The semantics of pterm and sterm differ only in rule STEP and BETA. Figure 5 
shows the modified rules. Instead of any matching clause, the first matching 
clause in a case abstraction is picked. 
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For the correctness proof, the order of clauses does not matter: we only need 
to prove that a step taken in the sequential semantics can be reproduced in the 
unordered semantics. As long as no rules are dropped, this is trivially true. For 
that reason, the compiler orders the clauses lexicographically. At the same time 
the rules are also converted from type (string x pterm) set to (string x sterm) list. 
Below, rs will always denote a list of the latter type. 


(name, rhs) € rs o name = Some v 


CONST 
rs,o F Sconst name | rhs rs,o F Svar name | v 


ABS 


rs,o F Acs | A[(pat, subst (o — frees pat) t | (pat, t) — cs] 


rs,ol-t| Acs 
c rsoH u|u first-match cs u’ = Some (ø', rhs) rs,o +o F rhs |v 
OMB 
roH t$uļv 
name € constructors rs,o F ti | us ee rs o F tn | Un 
CONSTR 


rs,a + Sconst name $ tı $... $ tn | Sconst name $ u1 $... $ un 


Fig. 6. Big-step semantics for sterm 


5.5 Big-Step Semantics for sterm 


This big-step semantics for sterm is not a compiler phase but moves towards 
the desired evaluation semantics. In this first step, we reuse the sterm type for 
evaluation results, instead of evaluating to the separate type value. This allows 
us to ignore environment capture in closures for now. 

All previous —> relations were parametrized by a rule set. Now the big-step 
predicate is of the form rs,o + t | t where o::string — sterm is a variable 
environment. 

This semantics also introduces the distinction between constructors and 
defined constants. If C is a constructor, the term (C tı ... tn) is evaluated to 
(Cti ... th) where the t; are the results of evaluating the ¢;. 

The full set of rules is shown in Fig.6. They deserve a short explanation: 


CONST. Constants are retrieved from the rule set rs. 

VAR. Variables are retrieved from the environment ø. 

ABs. In order to achieve the intended invariant, abstractions are evaluated to 
their fully substituted form. 

Comp. Function application t $ u first requires evaluation of t into an abstrac- 
tion A cs and evaluation of u into an arbitrary term u’. Afterwards, we look 
for a clause matching u’ in cs, which produces a local variable environment 
o’, possibly overwriting existing variables in ø. Finally, we evaluate the right- 
hand side of the clause with the combined global and local variable environ- 
ment. 

CONSTR. For a constructor application (C tı ...), evaluate all t;. The set con- 
structors is an implicit parameter of the semantics. 
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(name, rhs) € rs a name = Some v 
CONST AR 
rs, o F Sconst name | rhs rs,o F Svar name | v 
ABS 


rs,aot Acs | Vabs cso 


rs,o F t | Vabs cso’ 
rs,oF-ulv first_match cs v = Some (0, rhs) rs,o’ Ho" E rhs | v 


COMB ; 
rs,otFt$ulu 
rs,ot t | Vrecabs css name o’ css name = Some cs rs,oFulyv 
first_match cs v = Some (o”, rhs) rs,o' +0” F rhs |v 
RECCOMB - 
rs,aFt$ulv 
name € constructors rs,o F ti | ur tee Ta F tn | Un 
CONSTR 
rs,o H Sconst name $ tı $...$tn | Veonstr name [v1,...,Un| 


Fig. 7. Evaluation semantics from sterm to value 


Lemma 1 (Closedness invariant). If o contains only closed terms, frees t C 
dom o and rs,o H t | t', then t is closed. 


Correctness of the big-step w.r.t. the small-step semantics is proved easily by 
induction on the former: 


Lemma 2. For any closed environment o satisfying frees t C dom g, 
rs,o H t | u — rst subst o t —* u 
By setting o = |], we obtain: 


Theorem 2 (Correctness). rs,[] 1 ¢ | uA closed t > rs F t —* u 


5.6 Evaluation Semantics: Refining sterm to value 


At this point, we introduce the concept of values into the semantics, while still 
keeping the rule set (for constants) and the environment (for variables) separate. 
The evaluation rules are specified in Fig.7 and represent a departure from the 
original rewriting semantics: a term does not evaluate to another term but to an 
object of a different type, a value. We still use | as notation, because big-step 
and evaluation semantics can be disambiguated by their types. 

The evaluation model itself is fairly straightforward. As explained in Sect. 4.5, 
abstraction terms are evaluated to closures capturing the current variable envi- 
ronment. Note that at this point, recursive closures are not treated differently 
from non-recursive closures. In a later stage, when rs and o are merged, this 
distinction becomes relevant. 
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We will now explain each rule that has changed from the previous semantics: 


ABS. Abstraction terms are evaluated to a closure capturing the current 
environment. 

Comps. As before, in an application t$u, t must evaluate to a closure Vabs cs o”. 
The evaluation result of u is then matched against the clauses cs, producing 
an environment o”. The right-hand side of the clause is then evaluated using 
o’ + o”; the original environment ø is effectively discarded. 

RECCOMB. Similar as above. Finding the matching clause is a two-step process: 
First, the appropriate clause list is selected by name of the currently active 
function. Then, matching is performed. 

Constr. As before, for an n-ary application (C tı ...), where C is a data con- 
structor, we evaluate all t;. The result is a Vconstr value. 


Conversion Between sterm and value. To establish a correspondence between 
evaluating a term to an sterm and to a value, we apply the same trick as in 
Sect. 5.2. Instead of specifying a complicated relation, we translate value back 
to sterm: simply apply the substitutions in the captured environments to the 
clauses. 

The translation rules for Vabs and Vrecabs are kept similar to the ABS rule 
from the big-step semantics (Fig.6). Roughly speaking, the big-step semantics 
always keeps terms fully substituted, whereas the evaluation semantics defers 
substitution. 

Similarly to Sect.5.2, we can also define a function sterm_to_value :: sterm => 
value and prove that one function is the inverse of the other. 


Matching. The value type, instead of using binary function application as all 
other term types, uses n-ary constructor application. This introduces a concep- 
tual mismatch between (binary) patterns and values. To make the proofs easier, 
we introduce an intermediate type of n-ary patterns. This intermediate type can 
be optimized away by fusion. 


Correctness. The correctness proof requires a number of interesting lemmas. 


Lemma 3 (Substitution before evaluation). Assuming that a term t can 
be evaluated to a value u given a closed environment o, it can be evaluated to 
the same value after substitution with a sub-environment o’. Formally: rs,a + 
tluAo’ Cao-rs,ot subst o’ t | u 


This justifies the “pre-substitution” exhibited by the ABs rule in the big-step 
semantics in contrast to the environment-capturing ABS rule in the evaluation 
semantics. 


Theorem 3 (Correctness). Let o be a closed environment and t a term which 
only contains free variables in dom o. Then, an evaluation to a value rs,a Ft | vu 
can be reproduced in the big-step semantics as rs',map value_to_sterm o | t | 
value_to_sterm v, where rs’ = [(name, value_to_sterm rhs) | (name, rhs) — rs]. 
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Instantiating the Correctness Theorem. The correctness theorem states 
that, for any given evaluation of a term t with a given environment rs,a con- 
taining values, we can reproduce that evaluation in the big-step semantics using 
a derived list of rules rs’ and an environment o’ containing sterms that are gen- 
erated by the value_to_sterm function. But recall the diagram in Fig. 2. In our 
scenario, we start with a given rule set of sterms (that has been compiled from a 
rule set of terms). Hence, the correctness theorem only deals with the opposite 
direction. 

It remains to construct a suitable rs such that applying value_to_sterm to it 
yields the given sterm rule set. We can exploit the side condition (Sect. 5.1) that 
all bindings define functions, not constants: 


Definition 6 (Global clause set). The mapping global-css :: string — ((term x 
sterm) list) is obtained by stripping the Sabs constructors from all definitions and 
converting the resulting list to a mapping. 


For each definition with name f we define a corresponding term vs = Vrecabs 
global_css f []. In other words, each function is now represented by a recursive 
closure bundling all functions. Applying value_to_sterm to vy returns the original 
definition of f. Let rs denote the original sterm rule set and rs, the environment 
mapping all f’s to the vy’s. 

The variable environments ø and o’ can safely be set to the empty mapping, 
because top-level terms are evaluated without any free variable bindings. 
Corollary 1 (Correctness). rs,,[] ¢ | v — rs, [|] ¢ | value_to_sterm v 


Note that this step was not part of the compiler (although rs, is computable) 
but it is a refinement of the semantics to support a more modular correctness 
proof. 


Example. Recall the odd and even example from Sect. 4.5. After compilation to 
sterm, the rule set looks like this: 


rs = {("odd", Sabs [(0) = (False) , (Suc n} = (even n)]), 
("even", Sabs [(0) = (True) , (Suc n) = (odd n)])} 
This can be easily transformed into the following global clause set: 


global_css = ["odd" +> [(0) = (False) , (Suc n) = (even n)], 
"even" +> [(0) = (True) , (Suc n) = (odd n)]] 


Finally, rs, is computed by creating a recursive closure for each function: 


rsy = ["odd" +> Vrecabs global_css "odd" |], 


"even" +> Vrecabs global_css "even" []] 
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name ¢ constructors a name = Some v 


CONST 
at Sconst name | v 


o name = Some v 


VAR BS 
a Svar name | v aot Acs | Vabs cso 
att | Vabs cso’ 
c cHFulļv first-match cs v = Some (o”, rhs) ao +o" rhs |v 
OMB 
att$ulv' 
ok t | Vrecabs css name o’ 
css name = Some cs aFulv first_match cs v = Some (o”, rhs) 
ao’ ++ mk_rec_env css o’ + o” F rhs | v' 
RECCOMB - 
aFt$ulv 
name € constructors oFti lu see oF tn | Un 
CONSTR 
at Sconst name $t1$...$tn | Vconstr name [v1,..., Un] 


Fig. 8. ML-style evaluation semantics 


5.7 Evaluation with Recursive Closures 


CakeML distinguishes between non-recursive and recursive closures [30]. This 
distinction is also present in the value type. In this step, we will conflate vari- 
ables with constants which necessitates a special treatment of recursive closures. 
Therefore we introduce a new predicate ø | t | v in Fig.8 (in contrast to the 
previous rs,o | t | v). We examine the rules one by one: 


ConsT/VAR. Constant definition and variable values are both retrieved from 
the same environment o. We have opted to keep the distinction between 
constants and variables in the sterm type to avoid the introduction of another 
term type. 

ABs. Identical to the previous evaluation semantics. Note that evaluation never 
creates recursive closures at run-time (only at compile-time, see Sect. 5.6). 
Anonymous functions, e.g. in the term (map (Az. x)), are evaluated to non- 
recursive closures. 

COMB. Identical to the previous evaluation semantics. 

REcCComsB. Almost identical to the evaluation semantics. Additionally, for each 
function (name, cs) € css, a new recursive closure Vrecabs css name a’ is 
created and inserted into the environment. This ensures that after the first 
call to a recursive function, the function itself is present in the environment to 
be called recursively, without having to introduce coinductive environments. 

CONSTR. Identical to the evaluation semantics. 


Conflating Constants and Variables. By merging the rule set rs with the 
variable environment g, it becomes necessary to discuss possible clashes. Previ- 
ously, the syntactic distinction between Svar and Sconst meant that (x) and (x) 
are not ambiguous: all semantics up to the evaluation semantics clearly specify 
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where to look for the substitute. This is not the case in functional languages 
where functions and variables are not distinguished syntactically. 

Instead, we rely on the fact that the initial rule set only defines constants. All 
variables are introduced by matching before 8-reduction (that is, in the COMB 
and RECCOMB rules). The ABs rule does not change the environment. Hence 
it suffices to assume that variables in patterns must not overlap with constant 
names (see Sect. 5.1). 


Correspondence Relation. Both constant definitions and values of variables 
are recorded in a single environment ø. This also applies to the environment 
contained in a closure. The correspondence relation thus needs to take a different 
sets of bindings in closures into account. 

Hence, we define a relation ~~ that is implicitly parametrized on the rule 
set rs and compares environments. We call it right-conflating, because in a cor- 
respondence v %y u, any bound environment in u is thought to contain both 
variables and constants, whereas in v, any bound environment contains only 
variables. 


Definition 7 (Right-conflating correspondence). We define ~, coinduc- 
tively as follows: 


Uy Ry U1 iii Un Xy Un 
Vconstr name [v1,..., Un] Sy Vconstr name |u1,..., Un] 
Va € frees cs. 01 £ Xy O2 T Vax € consts cs. rs £ Xy O2 £ 


Vabs cs o1 &, Vabs cs og 


Vcs € range css. Vx € frees cs. 01 £ Xy O2 £ 
Ves € range css. Yx € consts cs. 01 £ Xy (o + mk_rec_env css o2) £ 


Vrecabs css name cı %y Vrecabs css name oz 


Consequently, %y is not reflexive. 


Correctness. The correctness lemma is straightforward to state: 


Theorem 4 (Correctness). Leto be an environment, t be a closed term and 
v a value such that o F t | v. If for all constants x occurring in t, rsx xy ox 
holds, then there is an u such that rs, || F t | u and u ~y v. 


As usual, the rather technical proof proceeds via induction over the semantics 
(Fig. 8). It is important to note that the global clause set construction (Sect. 5.6) 
satisfies the preconditions of this theorem: 


Lemma 4. If name is the name of a constant in rs, then 
Vrecabs global_css name || ~y Vrecabs global_css name Í] 


Because ~y is defined coinductively, the proof of this precondition proceeds by 
coinduction. 
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5.8 CakeML 


CakeML is a verified implementation of a subset of Standard ML [24,40]. It 
comprises a parser, type checker, formal semantics and backend for machine 
code. The semantics has been formalized in Lem [29], which allows export to 
Isabelle theories. 

Our compiler targets CakeML’s abstract syntax tree. However, we do not 
make use of certain CakeML features; notably mutable cells, modules, and lit- 
erals. We have derived a smaller, executable version of the original CakeML 
semantics, called CupCakeML, together with an equivalence proof. The correct- 
ness proof of the last compiler phase establishes a correspondence between Cup- 
CakeML and the final semantics of our compiler pipeline. 

For the correctness proof of the CakeML compiler, its authors have extracted 
the Lem specification into HOL4 theories [1]. In our work, we directly target 
CakeML abstract syntax trees (thereby bypassing the parser) and use its big- 
step semantics, which we have extracted into Isabelle.” 


Conversion from sterm to exp. After the series of translations described in the 
earlier sections, our terms are syntactically close to CakeML’s terms (Cake.exp). 
The only remaining differences are outlined below: 


— CakeML does not combine abstraction and pattern matching. For that reason, 
we have to translate A [p1 > t,,...] into Ax. case x of pı => tı | ..., where x 
is a fresh variable name. We reuse the fresh monad to obtain a bound variable 
name. Note that it is not necessary to thread through already created variable 
names, only existing names. The reason is simple: a generated variable is 
bound and then immediately used in the body. Shadowing it somewhere in 
the body is not problematic. 

— CakeML has two distinct syntactic categories for identifiers (that can repre- 
sent variables or functions) and data constructors. Our term types however 
have two distinct syntactic categories for constants (that can represent func- 
tions or data constructors) and variables. The necessary prerequisites to deal 
with this are already present in the ML-style evaluation semantics (Sect. 5.7) 
which conflates constants and variables, but has a dedicated CONSTR rule for 
data constructors. 


Types. During embedding (Sect. 3), all type information is erased. Yet, CakeML 
performs some limited form of type checking at run-time: constructing and 
matching data must always be fully applied. That is, data constructors must 
always occur with all arguments supplied on right-hand and left-hand sides. 

Fully applied constructors in terms can be easily guaranteed by simple pre- 
processing. For patterns however, this must be ensured throughout the com- 
pilation pipeline; it is (like other syntactic constraints) another side condition 
imposed on the rule set (Sect. 5.1). 


? Based on a repository snapshot from March 27, 2017 (0c48672). 
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The shape of datatypes and constructors is managed in CakeML’s environ- 
ment. This particular piece of information is allowed to vary in closures, since 
ML supports local type definitions. Tracking this would greatly complicate our 
proofs. Hence, we fix a global set of constructors and enforce that all values use 
exactly that one. 


Correspondence Relation. We define two different correspondence relations: 
One for values and one for expressions. 


Definition 8 (Expression correspondence) 


n É constructors 


VAR CONST 


-rele (Svar n) (Cake.Var n) rel_e (Sconst n) (Cake.Var n) 
n E constructors rel_e ty u1 
CONSTR 
rel_e (Sconst name $ tı $...$tn) (Cake.Con (Some (Cake.Short name) [u1,...,Un])) 
rel_e tı uy rel_e to Ug 


A 
PP rele tı $ t2 Cake.App Cake.Opapp [u1, u2] 
n ¢ ids (A [p1 => ti, ...]) U constructors 


E qı = mk_ml_pat pı rel_e ty uy 
UN 
rel_e (A [py => ty,...]) (Cake.Fun n (Cake.Mat (Cake.Var n)) [q = u1,.-.]) 
reletu qı = mk_ml_pat pı rel_e ty u1 


rel_e (A [py => ty,...] $ t) (Cake.Mat u [q1 > uy,---]) 
We will explain each of the rules briefly here. 


VAR. Variables are directly related by identical name. 

Const. As described earlier, constructors are treated specially in CakeML. In 
order to not confuse functions or variables with data constructors themselves, 
we require that the constant name is not a constructor. 

CONSTR. Constructors are directly related by identical name, and recursively 
related arguments. 

APP. CakeML does not just support general function application but also unary 
and binary operators. In fact, function application is the binary operator 
Opapp. We never generate other operators. Hence the correspondence is 
restricted to Opapp. 

Fun/MarT. Observe the symmetry between these two cases: In our term lan- 
guage, matching and abstraction are combined, which is not the case in 
CakeML. This means we relate a case abstraction to a CakeML function con- 
taining a match, and a case abstraction applied to a value to just a CakeML 
match. 


There is no separate relation for patterns, because their translation is simple. 

The value correspondence (rel_v) is structurally simpler. In the case of con- 
structor values (Vconstr and Cake.Conv), arguments are compared recursively. 
Closures and recursive closures are compared extensionally, i.e. only bindings 
that occur in the body are checked recursively for correspondence. 
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Correctness. We use the same trick as in Sect.5.6 to obtain a suitable envi- 
ronment for CakeML evaluation based on the rule set rs. 


Theorem 5 (Correctness). If the compiled expression sterm_to_cake t termi- 
nates with a value u in the CakeML semantics, there is a value v such that 
relvuuandrst tl v. 


6 Composition 


The complete compiler pipeline consists of multiple phases. Correctness is justi- 
fied for each phase between intermediate semantics and correspondence relations, 
most of which are rather technical. Whereas the compiler may be complex and 
impenetrable, the trustworthiness of the constructions hinges on the obviousness 
of those correspondence relations. 

Fortunately, under the assumption that terms to be evaluated and the result- 
ing values do not contain abstractions — or closures, respectively — all of the 
correspondence relations collapse to simple structural equality: two terms are 
related if and only if one can be converted to the other by consistent renaming 
of term constructors. 

The actual compiler can be characterized with two functions. Firstly, the 
translation of term to Cake.exp is a simple composition of each term translation 
function: 


definition term_to_cake :: term = Cake.exp where 
term_to_cake = sterm_to_cake o pterm_to_sterm o nterm_to_pterm o term_to_nterm 


Secondly, the function that translates function definitions by composing the 
phases as outlined in Fig. 2, including iterated application of pattern elimination: 


definition compile :: (term x term) fset = Cake.dec where 

compile = Cake.Dletrec o compile_srules_to_cake o compile_prules_to_srules o 
compile_irules_to_srules o compile_irules_iter o compile_crules_to_irules o 
consts_of o compile_rules_to_nrules 


Each function compile_* corresponds to one compiler phase; the remaining func- 
tions are trivial. This produces a CakeML top-level declaration. We prove that 
evaluating this declaration in the top-level semantics (evaluate_prog) results in an 
environment cake_sem_env. But cake_sem_env can also be computed via another 
instance of the global clause set trick (Sect. 5.6). 

Equipped with these functions, we can state the final correctness theorem: 


theorem compiled_correct: 
(« If CakeML evaluation of a term succeeds ... *) 
assumes evaluate False cake_sem_env s (term_to_cake t) (s’, Rval mlv) 
(* ... producing a constructor term without closures ... *) 
assumes cake_abstraction_free ml_v 
(x ... and some syntactic properties of the involved terms hold ... x) 
assumes closed t and — shadows_consts (heads rs U constructors) t and 
welldefined (heads rs U constructors) t and wellformed t 
(« ... then this evaluation can be reproduced in the term—rewriting semantics *) 
shows rs F t —* cake_to_term mlv 
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datatype 'a dict-add = Dict_add (‘a > 'a > 'a) 


fun cert_add :: ('a::add) dict_add = bool where 


Beas cert_add (Dict_add pls) = (pls = plus) 


fixes plus ::'a > 'a > 'a 
fun f' :: 'a dict_-add > 'a > 'a where 


definition f :: ('a::add) = 'a where P (Dietadd pls) 2 = plsti 


fx=plusxa 


lemma f’_eq: cert_add dict > f' dict = f 


(a) Source program 
<proof> 


(b) Result of translation 
Fig. 9. Dictionary construction in Isabelle 


This theorem directly relates the evaluation of a term t in the full CakeML 
(including mutability and exceptions) to the evaluation in the initial higher-order 
term rewriting semantics. The evaluation of t happens using the environment 
produced from the initial rule set. Hence, the theorem can be interpreted as the 
correctness of the pseudo-ML expression let rec rs in t. 

Observe that in the assumption, the conversion goes from our terms to 
CakeML expressions, whereas in the conclusion, the conversion goes the opposite 
direction. 


7 Dictionary Construction 


Isabelle’s type system supports type classes (or simply classes) [18,44] whereas 
CakeML does not. In order to not complicate the correctness proofs, type classes 
are not supported by our embedded term language either. Instead, we eliminate 
classes and instances by a dictionary construction [19] before embedding into the 
term language. Haftmann and Nipkow give a pen-and-paper correctness proof 
of this construction [17, Sect. 4.1]. We augmented the dictionary construction 
with the generation of a certificate theorem that shows the equivalence of the 
two versions of a function, with type classes and with dictionaries. This section 
briefly explains our dictionary construction. 

Figure9 shows a simple example of a dictionary construction. Type vari- 
ables may carry class constraints (e.g. a::add). The basic idea is that classes 
become dictionaries containing the functions of that class; class instances become 
dictionary definitions. Dictionaries are realized as datatypes. Class constraints 
become additional dictionary parameters for that class. In the example, class 
add becomes dict_add; function f is translated into f’ which takes an additional 
parameter of type dict_add. In reality our tool does not produce the Isabelle 
source code shown in Fig. 9b but performs the constructions internally. The cor- 
rectness lemma f’_eq is proved automatically. Its precondition expresses that the 
dictionary must contain exactly the function(s) of class add. For any monomor- 
phic instance, the precondition can be proved outright based on the certificate 
theorems proved for each class instance as explained next. 
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Not shown in the example is the translation of class instances. The basic 
form of a class instance in Isabelle is T::(c1,...,Cn) c where T is an n-ary type 
constructor. It corresponds to Haskell’s (cy @1,...,Cn Qn) => C (T Q1...Qn) 
and is translated into a function inst_c_tT::a, dict_c) > --- > a, dict_c, > 
(Q1,..-,Q@n) T dict_c and the following certificate theorem is proved: 


cert_c, dict; > +--+ — certcn dictn — cert_c (inst_c_r dict, ... dict,) 


For a more detailed explanation of how the dictionary construction works, we 
refer to the corresponding entry in the Archive of Formal Proofs [21]. 


8 Evaluation 


We have tried out our compiler on examples from existing Isabelle formalizations. 
This includes an implementation of Huffman encoding, lists and sorting, string 
functions [39], and various data structures from Okasaki’s book [34], including 
binary search trees, pairing heaps, and leftist heaps. These definitions can be 
processed with slight modifications: functions need to be totalized (see the end 
of Sect.3). However, parts of the tactics required for deep embedding proofs 
(Sect.3) are too slow on some functions and hence still need to be optimized. 


9 Conclusion 


For this paper we have concentrated on the compiler from Isabelle/HOL to 
CakeML abstract syntax trees. Partial correctness is proved w.r.t. the big-step 
semantics of CakeML. In the next step we will link our work with the compiler 
from CakeML to machine code. Tan et al. [40, Sect. 10] prove a correctness the- 
orem that relates their semantics with the execution of the compiled machine 
code. In that paper, they use a newer iteration of the CakeML semantics (func- 
tional big-step [35]) than we do here. Both semantics are still present in the 
CakeML source repository, together with an equivalence proof. Another impor- 
tant step consists of targeting CakeML’s native types, e.g. integer numbers and 
characters. 

Evaluation of our compiled programs is already possible via Isabelle’s pred- 
icate compiler [5], which allows us to turn CakeML’s big-step semantics into 
an executable function. We have used this execution mechanism to establish for 
sample programs that they terminate successfully. We also plan to prove that 
our compiled programs terminate, i.e. total correctness. 

The total size of this formalization, excluding theories extracted from Lem, 
is currently approximately 20000 lines of proof text (90 %) and ML code (10 %). 
The ML code itself produces relatively simple theorems, which means that there 
are less opportunities for it to go wrong. This constitutes an improvement over 
certifying approaches that prove complicated properties in ML. 
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Abstract. A valid compiler optimisation transforms a block in a pro- 
gram without introducing new observable behaviours to the program as 
a whole. Deciding which optimisations are valid can be difficult, and 
depends closely on the semantic model of the programming language. 
Axiomatic relaxed models, such as C++11, present particular challenges 
for determining validity, because such models allow subtle effects of a 
block transformation to be observed by the rest of the program. In this 
paper we present a denotational theory that captures optimisation valid- 
ity on an axiomatic model corresponding to a fragment of C++11. Our 
theory allows verifying an optimisation compositionally, by considering 
only the block it transforms instead of the whole program. Using this 
property, we realise the theory in the first push-button tool that can 
verify real-world optimisations under an axiomatic memory model. 


1 Introduction 


Context and Objectives. Any program defines a collection of observable 
behaviours: a sorting algorithm maps unsorted to sorted sequences, and a paint 
program responds to mouse clicks by updating a rendering. It is often desirable 
to transform a program without introducing new observable behaviours — for 
example, in a compiler optimisation or programmer refactoring. Such transfor- 
mations are called observational refinements, and they ensure that properties of 
the original program will carry over to the transformed version. It is also desir- 
able for transformations to be compositional, meaning that they can be applied 
to a block of code irrespective of the surrounding program context. Compo- 
sitional transformations are particularly useful for automated systems such as 
compilers, where they are known as peephole optimisations. 

The semantics of the language is highly significant in determining which 
transformations are valid, because it determines the ways that a block of code 
being transformed can interact with its context and thereby affect the observable 
behaviour of the whole program. Our work applies to a relaxed memory concur- 
rent setting. Thus, the context of a code-block includes both code sequentially 
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before and after the block, and code that runs in parallel. Relaxed memory means 
that different threads can observe different, apparently contradictory orders of 
events — such behaviour is permitted by programming languages to reflect CPU- 
level relaxations and to allow compiler optimisations. 

We focus on aziomatic memory models of the type used in C/C++ and 
Java. In axiomatic models, program executions are represented by structures of 
memory actions and relations on them, and program semantics is defined by 
a set of axioms constraining these structures. Reasoning about the correctness 
of program transformations on such memory models is very challenging, and 
indeed, compiler optimisations have been repeatedly shown unsound with respect 
to models they were intended to support [23,25]. The fundamental difficulty is 
that axiomatic models are defined in a global, non-compositional way, making 
it very challenging to reason compositionally about the single code-block being 
transformed. 


Approach. Suppose we have a code-block B, embedded into an unknown pro- 
gram context. We define a denotation for the code-block which summarises its 
behaviour in a restricted representative context. The denotation consists of a 
set of histories which track interactions across the boundary between the code- 
block and its context, but abstract from internal structure of the code-block. We 
can then validate a transformation from code-block B to B’ by comparing their 
denotations. This approach is compositional: it requires reasoning only about the 
code-blocks and representative contexts; the validity of the transformation in an 
arbitrary context will follow. It is also fully abstract, meaning that it can verify 
any valid transformation: considering only representative contexts and histories 
does not lose generality. 

We also define a variant of our denotation that is finite at the cost of losing 
full abstraction. We achieve this by further restricting the form of contexts one 
needs to consider in exchange for tracking more information in histories. For 
example, it is unnecessary to consider executions where two context operations 
read from the same write. 

Using this finite denotation, we implement a prototype verification tool, Stel- 
lite. Our tool converts an input transformation into a model in the Alloy lan- 
guage [12], and then checks that the transformation is valid using the Alloy* 
solver [18]. Our tool can prove or disprove a range of introduction, elimination, 
and exchange compiler optimisations. Many of these were verified by hand in 
previous work; our tool verifies them automatically. 


Contributions. Our contribution is twofold. First, we define the first fully 
abstract denotational semantics for an axiomatic relaxed model. Previous pro- 
posals in this space targeted either non-relaxed sequential consistency [6] or 
much more restrictive operational relaxed models [7,13,21]. Second, we show 
it is feasible to automatically verify relaxed-memory program transformations. 
Previous techniques required laborious proofs by hand or in a proof assistant [23— 
27]. Our target model is derived from the C/C++ 2011 standard [22]. However, 
our aim is not to handle C/C++ per se (especially as the model is in flux in 
several respects; see Sect. 3.7). Rather we target the simplest axiomatic model 
rich enough to demonstrate our approach. 
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2 Observation and Transformation 


Observational Refinement. The notion of observation is crucial when determin- 
ing how different programs are related. For example, observations might be I/O 
behaviour or writes to special variables. Given program executions X; and Xo, 
we write X1 <ex X2 if the observations in X, are replicated in Xə (defined for- 
mally in the following). Lifting this notion, a program P, observationally refines 
another P if every observable behaviour of one could also occur with the other 
— we write this P) <p, P2. More formally, let [—] be the map from programs to 
sets of executions. Then we define =p, as: 


Pip Pe <> YX € [P]-3X2 € [Po]. Xi <x X2 (1) 


Compositional Transformation. Many common program transformations are 
compositional: they modify a sequential fragment of the program without exam- 
ining the rest of the program. We call the former the code-block and the latter 
its context. Contexts can include sequential code before and after the block, and 
concurrent code that runs in parallel with it. Code-blocks are sequential, i.e. 
they do not feature internal concurrency. A context C and code-block B can be 
composed to give a whole program C(B). 

A transformation Ba ~~ Bı replaces some instance of the code-block Bo 
with Bı. To validate such a transformation, we must establish whether every 
whole program containing Bı observationally refines the same program with B2 
substituted. If this holds, we say that Bı observationally refines B2, written 
Bı =» B2, defined by lifting <p, as follows: 


By <p Bo <=> VC. C(By) <p C(B2) (2) 


If By, <p B2 holds, then the compiler can replace block Bə with block Bı 
irrespective of the whole program, i.e. B2 ~~ Bı is a valid transformation. Thus, 
deciding Bı =<, B2 is the core problem in validating compositional transforma- 
tions. 

The language semantics is highly significant in determining observa- 
tional refinement. For example, the code blocks Bı: store(x,5) and 
Bə: store(x,2); store(x,5) are observationally equivalent in a sequential set- 
ting. However, in a concurrent setting the intermediate state, x = 2, can be 
observed in Bə but not Bı, meaning the code-blocks are no longer observation- 
ally equivalent. In a relaxed-memory setting there is no global state seen by all 
threads, which further complicates the notion of observation. 


Compositional Verification. To establish Bı <j B2, it is difficult to examine all 
possible syntactic contexts. Our approach is to construct a denotation for each 
code-block — a simplified, ideally finite, summary of possible interactions between 
the block and its context. We then define a refinement relation on denotations 
and use it to establish observational refinement. We write Bı E Bo when the 
denotation of Bı refines Bo. 
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Refinement on denotations should be adequate, i.e., it should validly approx- 
imate observational refinement: Bı E By = > Bı Xp Bo. Hence, if By E Bo, 
then Bə ~~ B; is a valid transformation. It is also desirable for the denotation to 
be fully abstract: Bı <p, B2 = > Bı E B2. This means any valid transformation 
can be verified by comparing denotations. Below we define several versions of E 
with different properties. 


3 Target Language and Core Memory Model 


Our language’s memory model is derived from the C/C++ 2011 standard (hence- 
forth ‘C11’), as formalised by [5,22]. However, we simplify our model in several 
ways; see the end of section for details. In C11 terms, our model covers release- 
acquire and non-atomic operations, and sequentially consistent fences. To sim- 
plify the presentation, at first we omit non-atomics, and extend our approach to 
cover them in Sect.7. Thus, all operations in this section correspond to C11’s 
release-acquire. 


3.1 Relaxed Memory Primer 


In a sequentially consistent concurrent system, there is a total temporal order on 
loads and stores, and loads take the value of the most recent store; in particular, 
they cannot read overwritten values, or values written in the future. A relaxed 
(or weak) memory model weakens this total order, allowing behaviours forbidden 
under sequential consistency. Two standard examples of relaxed behaviour are 
store buffering (SB) and message passing (MP), shown in Fig. 1. 


store(x,0); store(y,0); store(f,0); store(x,0); 
store(x,1); store(y,1); store(x,1); b := load(f); 
vi := load(y); v2 := load(x); store(f,1); if (b == 1) 
r := load(x); 


Fig. 1. Left: store-buffering (SB) example. Right: message-passing (MP) example. 


In most relaxed models v1 = v2 = 0 is a possible post-state for SB. This 
cannot occur on a sequentially consistent system: if v1 = 0, then store(y,1) 
must be ordered after the load of y, which would order store(x,1) before the 
load of x, forcing it to assign v2 = 1. In some relaxed models, b = 1 A r = 0 is 
a possible post-state for MP. This is undesirable if, for example, x is a complex 
data-structure and f is a flag indicating it has been safely created. 


3.2 Language Syntax 


Programs in the language we consider manipulate thread-local variables 
lL, lı, l2... E€ War and global variables x,y,... € GVar, coming from disjoint sets 
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LVar and GVar. Each variable stores a value from a finite set Val and is initialised 
to 0 € Val. Constants are encoded by special read-only thread-local variables. 
We assume that each thread uses the same set of thread-local variable names 
LVar. The syntax of the programming language is as follows: 


C = l:= E | store(z,l) | l := load(«) | l := LL(x) | V := SC(a,1) | fence | 
Cy || C2 | C1; C2 | if (2) {Ci} else {C2} | {—} 
Bosr=Uh=hl|h All... 


Many of the constructs are standard. LL(x) and SC(x,l) are load-link and 
store-conditional, which are basic concurrency operations available on many plat- 
forms (e.g., Power and ARM). A load-link LL(x) behaves as a standard load of 
global variable x. However, if it is followed by a store-conditional SC(x,/), the 
store fails and returns false if there are intervening writes to the same location. 
Otherwise the store-conditional writes l and returns true. The fence command 
is a sequentially consistent fence: interleaving such fences between all statements 
in a program guarantees sequentially consistent behaviour. We do not include 
compare-and-swap (CAS) command in our language because LL-SC is more gen- 
eral [2]. Hardware-level LL-SC is used to implement C11 CAS on Power and 
ARM. Our language does not include loops because our model in this paper 
does not include infinite computations (see Sect. 3.7 for discussion). As a result, 
loops can be represented by their finite unrollings. Our load commands write 
into a local variable. In examples, we sometimes use ‘bare’ loads without a vari- 
able write. 

The construct {—} represents a block-shaped hole in the program. To sim- 
plify our presentation, we assume that at most one hole appears in the pro- 
gram. Transformations that apply to multiple blocks at once can be simulated 
by using the fact our approach is compositional: transformations can be applied 
in sequence using different divisions of the program into code-block and context. 

The set Prog of whole programs consists of programs without holes, while 
the set Contx of contexts consists of programs with a hole. The set Block of 
code-blocks are whole programs without parallel composition. We often write 
P € Prog for a whole program, B € Block for a code-block, and C € Contx for 
a context. Given a context C and a code-block B, the composition C(B) is C 
with its hole syntactically replaced by B. For example: 


C: load(x); {-}; store(y,11), B: store(x,2) 
— C(B): load(x); store(x,2); store(y,11) 


We restrict Prog, Contx and Block to ensure LL-SC pairs are matched cor- 
rectly. Each SC must be preceded in program order by a LL to the same location. 
Other types of operations may occur between the LL and SC, but interven- 
ing SC operations are forbidden. For example, the program LL(x); SC(x,v1); 
SC(x,v2); is forbidden. We also forbid LL-SC pairs from spanning parallel com- 
positions, and from spanning the block/context boundary. 
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3.3 Memory Model Structure 


The semantics of a whole program P is given by a set [P] of executions, which 
consist of actions, representing memory events on global variables, and sev- 
eral relations on these. Actions are tuples in the set Action â ActID x Kind x 
Option(GVar) x Val*. In an action (a,k,z,b) € Action: a € ActID is the unique 
action identifier; k € Kind is the kind of action — we use load, store, LL, SC, 
and the failed variant SCy in the semantics, and will introduce further kinds as 
needed; z € Option(GVar) is an option type consisting of either a single global 
variable Just(2) or None; and b € Val* is the vector of values (actions with 
multiple values are used in Sect. 4). 

Given an action v, we use gvar(v) and val(v) as selectors for the different fields. 
We often write actions so as to elide action identifiers and the option type. For 
example, load(x,3) stands for Ji. (i, load, Just(x), [3]). We also sometimes elide 
values. We call load and LL actions reads, and store and successful SC actions 
writes. Given a set of actions A, we write, e.g., reads( A) to identify read actions 
in A. Below, we range over all actions by u,v; read actions by r; write actions 
by w; and LL, SC actions by ll and sc respectively. 


(l := load(x), o) 
(store(z, 1), o) 
(C1; Ca, 0) 


{({load(z, a)},@,o[l => a]) | a € Val} 
{({store(x,a)}, 0,0) | a(l) = a} 
{(Ai U A2, sbi U sb2 U (Ai x A2), 02) | 

(A1, sbi, 01) E (C1,0) A (Aa, sb2, 02) E€ (C2, 01)} 
(fence, o) 2 {({U, sc}, {(i, sc)}, o) | U= LL(fen, 0) A sc = SC(fen, 0)} 


I> IIb ID 


Fig. 2. Selected clauses of the thread-local semantics. The full semantics is given in 
[10, Sect. A]. We write A, U Ag for a union that is defined only when actions in A; and 
A2 use disjoint sets of identifiers. We omit identifiers from actions to avoid clutter. 


The semantics of a program P € Prog is defined in two stages. First, a thread- 
local semantics of P produces a set (P) of pre-executions (A,sb) € PreExec. A 
pre-execution contains a finite set of memory actions A C Action that could 
be produced by the program. It has a transitive and irreflexive sequence-before 
relation sb C A x A, which defines the sequential order imposed by the program 
syntax. 

For example two sequential statements in the same thread produce actions 
ordered in sb. The thread-local semantics takes into account control flow in P’s 
threads and operations on local variables. However, it does not constrain the 
behaviour of global variables: the values threads read from them are chosen 
arbitrarily. This is addressed by extending pre-executions with extra relations, 
and filtering the resulting executions using validity axioms. 
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3.4 Thread-Local Semantics 


The thread-local semantics is defined formally in Fig.2. The semantics of a 
program P € Prog is defined using function (—,—): Prog x VMap —> P(PreExec x 


VMap). The values of local variables are tracked by a map ø € VMap = LVar — 
Val. Given a program and an input local variable map, the function produces a 
set of pre-executions paired with an output variable map, representing the values 
of local variables at the end of the execution. Let og map every local variable to 
0. Then (P), the thread-local semantics of a program P, is defined as 


(P) ĉĉ  {(Ajsb) | do’. (A,sb, 0’) € (P,o0)} 


The significant property of the thread-local semantics is that it does not 
restrict the behaviour of global variables. For this reason, note that the clause 
for load in Fig. 2 leaves the value a unrestricted. We follow [16] in encoding the 
fence command by a successful LL-SC pair to a distinguished variable fen € GVar 
that is not otherwise read or written. 


3.5 Execution Structure and Validity Axioms 


The semantics of a program P is a set [|P] of executions X = 
(A, sb, at, rf, mo, hb) € Exec, where (A, sb) is a pre-execution and at, rf, mo, hb C 
A x A. Given an execution X we sometimes write A(X),sb(X),... as selectors 
for the appropriate set or relation. The relations have the following purposes. 


— Reads-from (rf) is an injective map from reads to writes at the same location 


of the same value. A read and a write actions are related w “+ r if r takes its 
value from w. 

— Modification order (mo) is an irreflexive, total order on write actions to each 
distinct variable. This is a per-variable order in which all threads observe 
writes to the variable; two threads cannot observe these writes in different 
orders. 

— Happens-before (hb) is analogous to global temporal order — but unlike the 
sequentially consistent notion of time, it is partial. Happens-before is defined 
as (sbUrf)*: therefore statements ordered in the program syntax are ordered 
in time, as are reads with the writes they observe. 

— Atomicity (at C sb) is an extension to standard C11 which we use to support 
LL-SC (see below). It is an injective function from a successful load-link action 
to a successful store-conditional, giving a LL-SC pair. 


The semantics [|P] of a program P is the set of executions X € Exec compat- 
ible with the thread-local semantics and the validity axioms, denoted valid(X): 


[P] Ê {X | (A(X), sb(X)) € (P) A valid(X)} (3) 


The validity axioms on an execution (A, sb, at, rf, mo, hb) are: 
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— HBpEF: hb = (sb Urf)* and hb is acyclic. 
This axiom defines hb and enforces the intuitive property that there are no 
cycles in the temporal order. It also prevents an action reading from its hb- 
future: as rf is included in hb, this would result in a cycle. 
hb 

HBvsMO: 74wi,w2. w1 we 

“mo 
This axiom requires that the order in which writes to a location become visible 
to threads cannot contradict the temporal order. But take note that writes 
may be ordered in mo but not hb. 


_ hb 
COHERENCE: =~Jw1, W2, r. W1 > w RS 
— o a 


rf 
This axiom generalises the sequentially consistent prohibition on reading over- 
written values. If two writes are ordered in mo, then intuitively the second 
overwrites the first. A read that follows some write in hb or mo cannot read 
from writes earlier in mo — these earlier writes have been overwritten. How- 
ever, unlike in sequential consistency, hb is partial, so there may be multiple 
writes that an action can legally read. 
RFVAL: Vr. (73w. w’ an r) => (val(r) =0A 

(~3w. w yr iA gvar(w) = gvar(r))) 

Most reads must take their value from a write, represented by an rf edge. 
However, the RFVAL axiom allows the rf edge to be omitted if the read 
takes the initial value 0 and there is no hb-earlier write to the same location. 
Intuitively, an hb-earlier write would supersede the initial value in a similar 
way to COHERENCE. 


— mo 
ATOM: 74w}, we, ll, sc. wy — > we 


This axiom is adapted from [16]. For an LL-SC pair ll and sc, it ensures that 
there is no mo-intervening write w2 that would invalidate the store. 


Our model forbids the problematic relaxed 


behaviour of the message-passing (MP) pro- siare u) 
gram in Fig.1 that yields b = 1 Ar = 0. sb, hb 
Figure3 shows an (invalid) execution that 

would exhibit this behaviour. To avoid clut- store(x, 0) 
ter, here and in the following we omit hb edges w sb, hb 
obtained by transitivity and local variable val- 

ues. This execution is allowed by the thread- store(x, 1) load(£, 1) rf,hb 
local semantics of the MP program, but it is 

ruled out by the COHERENCE validity axiom. E Shhh 

As hb is transitively closed, there is a derived store(f, 1) load(x, 0) 


hb edge store(x, 1) 1, load(x,0), which forms 


a COHERENCE violation. Thus, this is not an 
execution of the MP program. Indeed, any 


Fig. 3. An invalid execution of MP. 
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execution ending in load(x,0) is forbidden for the same reason, meaning that 
the MP relaxed behaviour cannot occur. 


3.6 Relaxed Observations 


Finally, we define a notion of observational refinement suitable for our relaxed 
model. We assume a subset of observable global variables, OVar C GVar, which 
can only be accessed by the context and not by the code-block. We consider the 
actions and the hb relation on these variables to be the observations. We write 
X|ovar for the projection of X’s action set and relations to OVar, and use this 
to define <,, for our model: 


X <q ¥ <> A(Xlovar) = A(Y lova) A hb(Y lover) © hb(Xovar) 


This is lifted to programs and blocks as in Sect.2, def. (1) and (2). Note that 
in the more abstract execution, actions on observable variables must be the 
same, but hb can be weaker. This is because we interpret hb as a constraint on 
time order: two actions that are unordered in hb could have occurred in either 
order, or in parallel. Thus, weakening hb allows more observable behaviours (see 
Sect. 2). 


3.7 Differences from C11 


Our language’s memory model is derived from the C11 formalisation in [5], with 
a number of simplifications. We chose C11 because it demonstrates most of the 
important features of axiomatic language models. However, we do not target the 
precise C11 model: rather we target an abstracted model that is rich enough 
to demonstrate our approach. Relaxed language semantics is still a very active 
topic of research, and several C11 features are known to be significantly flawed, 
with multiple competing fixes proposed. Some of our differences from [5] are 
intended to avoid such problematic features so that we can cleanly demonstrate 
our approach. 

In C11 terms, our model covers release-acquire and non-atomic operations 
(the latter addressed in Sect.7), and sequentially consistent fences. We deviate 
from C11 in the following ways: 


— We omit sequentially consistent accesses because their semantics is known 
to be flawed in C11 [17]. We do handle sequentially consistent fences, but 
these are stronger than those of C11: we use the semantics proposed in [16]. 
It has been proved sound under existing compilation strategies to common 
multiprocessors. 

— We omit relazed (RLX) accesses to avoid well-known problems with thin- 
air values [4]. There are multiple recent competing proposals for fixing these 
problems, e.g. [14,15, 20]. 

— Our model does not include infinite computations, because their semantics in 
C11-style axiomatic models remains undecided in the literature [4]. However, 
our proofs do not depend on the assumption that execution contexts are finite. 
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— Our language is based on shared variables, not dynamically allocated address- 
able memory, so for example we cannot write y:=*x; z:=*y. This simplifies 
our theory by allowing us to fix the variables accessed by a code-block up- 
front. We believe our results can be extended to support addressable memory, 
because C11-style models grant no special status to pointers; we elaborate on 
this in Sect. 4. 

— We add LL-SC atomic instructions to our language in addition to C11’s stan- 
dard CAS. To do this, we adapt the approach of [16]. This increases the observa- 
tional power of a context and is necessary for full abstraction in the presence of 
non-atomics; see Sect. 8. LL-SC is available as a hardware instruction on many 
platforms supporting C11, such as Power and ARM. However, we do not pro- 
pose adding LL-SC to C11: rather, it supports an interesting result in relaxed 
memory model theory. Our adequacy results do not depend on LL-SC. 


4 Denotations of Code-Blocks 


We construct the denotation for a code-block in two steps: (1) generate the 
block-local executions under a set of special cut-down contexts; (2) from each 
execution, extract a summary of interactions between the code-block and the 
context called a history. 


4.1 Block-Local Executions 


The block-local executions of a block B € Block omit context structure such as 
syntax and actions on variables not accessed in the block. Instead the context is 
represented by special actions call and ret, a set Ag, and relations Rg and Sp, 
each covering an aspect of the interaction of the block and an arbitrary unre- 
stricted context. Together, each choice of call, ret, Ag, Rg, and Sg abstractly 
represents a set of possible syntactic contexts. By quantifying over the possible 
values of these parameters, we cover the behaviour of all syntactic contexts. The 
parameters are defined as follows: 


— Local variables. A context can include code that precedes and follows the 
block on the same thread, with interaction through local variables, but — 
due to syntactic restriction — not through LL/SC atomic regions. We capture 
this with special action call(a) at the start of the block, and ret(o’) at the 
end, where o,o’: LVar — Val record the values of local variables at these 
points. Assume that variables in LVar are ordered: 11, l2,...,l,. Then call(c) 
is encoded by the action (i, call, None, [o(11), ... o(In)]), with fresh identifier 
i. We encode ret in the same way. 

— Global variable actions. The context can also interact with the block through 
concurrent reads and writes to global variables. These interactions are rep- 
resented by set Ap of context actions added to the ones generated by the 
thread-local semantics of the block. This set only contains actions on the 
variables VSg that B can access (VSp can be constructed syntactically). 
Given an execution X constructed using Ag (see below) we write contx(X) 
to recover the set Ap. 
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— Context happens-before. The context can generate hb edges between its 


actions, which affect the behaviour of the block. We track these effects with 
a relation Rg over actions in Apg, call and ret: 


Rp C (Ap x Ap) U (Apg x {call}) U ({ret} x Ap) (4) 


The context can generate hb edges between actions directly if they are on the 
same thread, or indirectly through inter-thread reads. Likewise call/ret may 
be related to context actions on the same or different threads. 

Context atomicity. The context can generate at edges between its actions 
that we capture in the relation Sg C Ap x Ag. We require this relation 
to be an injective function from LL to SC actions. We consider only cases 
where LL/SC pairs do not cross block boundaries, so we need not consider 
boundary-crossing at edges. 


Together, call, ret, Ag, Rg, and Sg represent a limited context, stripped 


of syntax, relations sb, mo, and rf, and actions on global variables other than 
VSg. When constructing block-local executions, we represent all possible inter- 
actions by quantifying over all possible choices of o, 0’, Ap, Rg and Spg. The set 
|B, Ap, Rg, Sp] contains all executions of B under this special limited context. 
Formally, an execution X = (A, sb, at, rf, mo, hb) is in this set if: 


1. 


Ap C A and there exist variable maps 0,0’ such that {call(c), ret(a’)} C 
A. That is, the call, return, and extra context actions are included in the 
execution. 


. There exists a set A; and relation sb; such that (i) (Ai,sbz,0’) € (B,o); (ii) 


A, = A\ (Ap U {call, ret}); (iii) sb; = sb \ {(call, u), (u, ret) | u € A}. That is, 
actions from the code-block satisfy the thread-local semantics, beginning with 
map o and deriving map o’. All actions arising from the block are between 
call and ret in sb. 

X satisfies the validity axioms, but with modified axioms HBDEF’ and ATOM’. 
We define HBDEF’ as: hb = (sbUrfU Rg)? and hb is acyclic. That is, context 
relation Rg is added to hb. ATOM’ is defined analogously with Sg added to 
at. 


We say that Ag, Rg and Spg are consistent with B if they act over variables 


in the set VSg. In the rest of the paper we only consider consistent choices 
of Ap, Rg, Sp. The block-local executions of B are then all executions X € 
[B, Ag, Re, Sa). 


1 This definition relies on the fact that our language supports a fixed set of global 


variables, not dynamically allocated addressable memory (see Sect. 3.7). We believe 
that in the future our results can be extended to support dynamic memory. For 
this, the block-local construction would need to quantify over actions on all possible 
memory locations, not just the static variable set VSg. The rest of our theory would 
remain the same, because C11-style models grant no special status to pointer values. 
Cutting down to a finite denotation, as in Sect.5 below, would require some extra 
abstraction over memory — for example, a separation logic domain such as [9]. 
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hb, Ra _Store(x,2) store(x,2) 
2° hb, Ry mo Y 
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l eb; no | i store(x,1) : l store(x,1) 
i i hb, Rg ! 
es” a; an 
$ sb, no | ; tp Store(f,1) store(f,1) 
| load(x,1) < : E 
i sb, ro | : rf, hb pa G 


Fig. 4. Left: block-local execution. Right: corresponding history. 


Example Block-Local Execution. The left of Fig. 4 shows a block-local execution 
for the code-block 
11 := load(f); 12 := load(x) (5) 


Here the set VSpg of accessed global variables is {f,x}, As before, we omit local 
variables to avoid clutter. The context action set Ag consists of the three stores, 
and Rp is denoted by dotted edges. 

In this execution, both Ag and Rpg affect the behaviour of the code-block. 
The following path is generated by Rg and the load of f = 1: 


store(x, 2) 75 store(x, 1) ap store(f, 1) un load(f, 1) = load(x, 1) 


Because hb includes sb, rf, and Rp, there is a transitive edge store(x, 1) 1, 
load(x, 1). The edge store(x,2) “> store(x, 1) is forced because the HBvsMO 
axiom prohibits mo from contradicting hb. Consequently, the COHERENCE axiom 
forces the code-block to read x = 1. 


4.2 Histories 


From any block-local execution X, its history summarises the interactions 
between the code-block and the context. Informally, the history records hb over 
context actions, call, and ret. More formally the history, written hist( X), is a pair 
(A, G) consisting of an action set A and guarantee relation G C A x A. Recall 
that we use contx(X) to denote the set of context actions in X. Using this, we 
define the history as follows: 


— The action set A is the projection of X’s action set to call, ret, and contx(X). 
— The guarantee relation G is the projection of hb(X) to 


(contx(X) x contx(X)) U (contx(X) x {ret}) U ({call} x contx(X)) (6) 


The guarantee summarises the code-block’s effect on its context: it suffices to 
only track hb and ignore other relations. Note the guarantee definition is similar 
to the context relation Rpg, definition (4). The difference is that call and ret are 
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Execution 1: History 1: Execution 2: History 2: 
eusnea sos store(y,1) store(y,1) 
“a? \\re \\re 
os call see store(y,2) call poo ~~! store(y,2) peels 
b : i 
: | pr sb sb 
load(f,1) =s f : : 
i sb : store(f,1) i : store(f,1) > load(x,0) : store(f,1) i : store(f,1) 
: : 1 ' ‘ 1 : i : : 
: load(x,0) | 57 oe : 
‘aie 7G z 
sb r: : 73 
west | H $: : : : : 
s B ret ------: i- ret ---i EPERE a ERA f kaopat e 
A sb sb 
load(y) load(y) 


Fig. 5. Executions and histories illustrating the guarantee relation. 


switched: this is because the guarantee represents hb edges generated by the 
code-block, while Rg represents the edges generated by the context. The right 
of Fig. 4 shows the history corresponding to the block-local execution on the left. 

To see the interactions captured by the guarantee, compare the block given 
in def. (5) with the block 12:=load(x). These blocks have differing effects on 
the following syntactic context: 


store(y,1); store(y,2); store(f,1) || {-}; 13:=load(y) 


For the two-load block embedded into this context, 11 = 1A 13 = 1 is not a 
possible post-state. For the single-load block, this post-state is permitted.” 

In Fig. 5, we give executions for both blocks embedded into this context. We 
draw the context actions that are not included into the history in grey. In these 
executions, the code block determines whether the load of y can read value 1 
(represented by the edge labelled ‘rf?’). In the first execution, the context load 


of y cannot read 1 because there is the path store(y, 1) “+ store(y, 2) = load(y) 
which would contradict the COHERENCE axiom. In the second execution there 
is no such path and the load may read 1. 

It is desirable for our denotation to hide the precise operations inside the 
block — this lets it relate syntactically distinct blocks. Nonetheless, the history 
must record hb effects such as those above that are visible to the context. In 
Execution 1, the COHERENCE violation is still visible if we only consider context 
operations, call, ret, and the guarantee G — i.e. the history. In Execution 2, the 
fact that the read is permitted is likewise visible from examining the history. 
Thus the guarantee, combined with the local variable post-states, capture the 
effect of the block on the context without recording the actions inside the block. 


2 We choose these post-states for exposition purposes — in fact these blocks are also 
distinguishable through local variable 11 alone. 
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4.3 Comparing Denotations 


The denotation of a code-block B is the set of histories of block-local executions 
of B under each possible context, i.e. the set 


{hist(X) | 3AB, Rg, Sg. X € |B, Ag, Re, Spl} 


To compare the denotations of two code-blocks, we first define a refinement 
relation on histories: (A1, G1) En (A2, G2) holds iff Ay = Ag A Go C Ga. The 
history (A2, G2) places fewer restrictions on the context than (41, G1) — a weaker 
guarantee corresponds to more observable behaviours. For example in Fig. 5, 
History 1 Ep History 2 but not vice versa, which reflects the fact that History 
1 rules out the read pattern discussed above. 

We write Bı Ey B2 to state that the denotation of Bı refines that of Bo. 
The subscript ‘q’ stands for the fact we quantify over both A and Rg. We define 
Eg by lifting Ep: 


Bı La Bə 4s VA, R,S.YXı € [Bi,A, R, S]. (7) 
JX € [B2, A, R, S]. hist( X1) Ch hist( X2) 


In other words, two code-blocks are related Bı E, B2 if for every block-local 
execution of B4, there is a corresponding execution of Bə with a related history. 
Note that the corresponding history must be constructed under the same cut- 
down context A, R, S. 


Theorem 1 (ADEQUACY OF C,). Bı Eq Bo Bı <p Bo. 


Theorem 2 (FULL ABSTRACTION OF Ca). By <p Bo = > Bı Cq Bo. 


As a corollary of the above theorems, a program transformation Bz ~ Bı 
is valid if and only if Bı Eq B2 holds. We prove Theorem 1 in [10, Sect. B]. We 
give a proof sketch of Theorem 2 in Sect.8 and a full proof in [10, Sect. F]. 


Execution X1: Execution Xə: History: 


: store(x,1) l : $ 
a o 
: store(x,1)_ : i i N 
$ sb, no | ; : i 


load(x,1) load(x,1) 
7 y 


Fig. 6. History comparison for an example program transformation. 
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4.4 Example Transformation 
We now consider how our approach applies to a simple program transformation: 


Bə: store(x,11); store(x,11) ~ Bı: store(x,11) 


To verify this transformation, we must show that Bı Eg B2. To do this, we must 
consider the unboundedly many block-local executions. Here we just illustrate 
the reasoning for a single block-local execution; in Sect.5 below we define a 
context reduction which lets us consider a finite set of such executions. 

In Fig.6, we illustrate the necessary reasoning for an execution Xı € 
|B1, A, R, S], with a context action set A consisting of a single load x = 1, 
a context relation R relating ret to the load, and an empty S relation. This 
choice of R forces the context load to read from the store in the block. We can 
exhibit an execution Xə € [B2,A,R,S] with a matching history by making the 
context load read from the final store in the block. 


5 A Finite Denotation 


The approach above simplifies contexts by removing syntax and non-hb struc- 
ture, but there are still infinitely many A/R/S contexts for any code-block. To 
solve this, we introduce a type of context reduction which allows us to consider 
only finitely many block-local executions. This means that we can automatically 
check transformations by examining all such executions. However this ‘cut down’ 
approach is no longer fully abstract. We modify our denotation as follows: 


— We remove the quantification over context relation R from definition (7) by 
fixing it as Ø. In exchange, we extend the history with an extra component 
called a deny. 

— We eliminate redundant block-local executions from the denotation, and only 
consider a reduced set of executions X that satisfy a predicate cut(X). 


These two steps are both necessary to achieve finiteness. Removing the R 
relation reduces the amount of structure in the context. This makes it possible 
to then remove redundant patterns — for example, duplicate reads from the same 
write. 

Before defining the two steps in detail, we give the structure of our modified 
refinement E.. In the definition, histe(X) stands for the extended history of an 
execution X, and Ce for refinement on extended histories. 


Bi Ce Bo <> VA,S.VX, € [B1,A,0, 5]. 
cut( X1) => JX% € |B2, A, 0, S]. histe(X1) EE histe(X2)(8) 


As with Cq above, the refinement E, is adequate. However, it is not fully 
abstract (we provide a counterexample in [10, Sect. D]). We prove the following 
theorem in [10, Sect. E]. 


Theorem 3 (ADEQUACY OF E,). By Ce B2 Bı Xp Bo. 
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5.1 Cutting Predicate 


Removing the context relation R in definition (8) removes a large amount of 
structure from the context. However, there are still unboundedly many block- 
local executions with an empty R — for example, we can have an unbounded 
number of reads and writes that do not interact with the block. The cutting 
predicate identifies these redundant executions. 

We first identify the actions in a block-local execution that are visible, mean- 
ing they directly interact with the block. We write code(X) for the set of actions 
in X generated by the code-block. Visible actions belong to code(X), read from 
code(X), or are read by code(X). In other words, 


)uSbovo Su} 


vis( X) 4 code(X) U {u | du € code(X 


Informally, cutting eliminates three redundant patterns: (i) non-visible con- 
text reads, i.e. reads from context writes; (ii) duplicate context reads from the 
same write; and (iii) duplicate non-visible writes that are not separated in mo 
by a visible write. Formally we define cut’(X), the conjunction of cutR for read, 
and cutW for write. 


X) < reads( X) C vis(X) A 

Yri, r2 € contx( X). (rı Are > ~w. w É rnawb r2) 
) 4& Yw, w € (contx(X) \ vis(X)). 

wi => w > Jwz € vis( X). w, 2> w3 2> we 
cut'(X) <45 cutR(X) A cutW(X) 


cutR( 


cutW(X 


The final predicate cut(X) extends this in order to keep LL-SC pairs together: it 
requires that, if cut’() permits one half of an LL-SC, the other is also permitted 
implicitly (for brevity we omit the formal definition of cut() in terms of cut’). 


a Call r [a] Forbidden by cutW(). Two 
: i og load(x,0) | non-visible stores without a vis- 

| | a X [d] ible store intervening in mo. 
_ store(x, 0) > load(x,0) ) [b] Forbidden by cutR(). Load is 
( store(x,1) | ‘ao non-visible as it reads from a 

T J o load(y,0) context store. 

) l T soretvo [c] Forbidden by cutR(). Both 
\ store(x,2) reads are visible, but are du- 


(0) load(x,2) 


store(x,3) 


l aa 


[d] 


plicates, reading from the same 
write. 


Allowed. Visible load and store. 


Fig. 7. Left: block-local execution which includes patterns forbidden by cut(). Right: 
key explaining the patterns forbidden or allowed. 
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It should be intuitively clear why the first two of the above patterns are 
redundant. The main surprise is the third pattern, which preserves some non- 
visible writes. This is required by Theorem 3 for technical reasons connected to 
per-location coherence. We illustrate the application of cut() to a block-local 
execution in Fig. 7. 


5.2 Extended History (hist_) 


In our approach, each block-local execution represents a pattern of interaction 
between block and context. In our previous definition of Eg, constraints imposed 
by the block are captured by the guarantee, while constraints imposed by the 
context are captured by the R relation. The definition (8) of E. removes the 
context relation R, but these constraints must still be represented. Instead, we 
replace R with a history component called a deny. This simplifies the block-local 
executions, but compensates by recording more in the denotation. 
The deny records the hb edges that cannot be 


enforced due to the execution structure. For exam- load(x,1) 
ple, consider the block-local execution? of Fig. 8. D; 
This pattern could not occur in a context that ...... call sees 


generates the dashed edge D as a hb — to do so would : sb | 

violate the HBvsMO axiom. In our previous defi- | store(x,0) +> store(x,1) 

nition of Cq, we explicitly represented the presence; eB n ™ 

or absence of this edge through the R relation. In | 

our new formulation, we represent such ‘forbidden’ 

edges in the history by a deny edge. Fig. 8. A deny edge. 
The extended history of an execution X, written 

histe(X) is a triple (A, Œ, D), consisting of the familiar notions of action set A 

and guarantee G C A x A, together with deny D C A x A as defined below: 


DÉ {(u, v) | HBvsMO-d(u, v) V Cohere-d(u, v) V RFval-d(u, v)} N 
((contx(X) x contx(X)) U (contx(X) x {call}) U ({ret} x contx(X))) 
Each of the predicates HBvsMO-d, Cohere-d, and RFval-d generates the deny 


for one validity axiom. In the diagrammatic definitions below, dashed edges 
represent the deny edge, and hb” is the reflexive-transitive closure of hb: 


hb* D hb* 


HBvsMO-d(u,v): 3w1, we. wy Swe > V > W2 
So 
mo 
mo hb* D hb* 
Coherence-d(u,v): = wy ——> w — > u >U—>r 


RFval-d(u,v): dw,r.gvar(w) = gvar(r) A 


Si pe capt hb* D hb* 
agw. w — rA wou >v ——r 


3 We use this execution for illustration, but in fact the cut() predicate would forbid 
the load. 
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One can think of a deny edge as an ‘almost’ violation of an axiom. For 
example, if HBvsMO-d(u, v) holds, then the context cannot generate an extra 
hb-edge u +, vy — to do so would violate HBvsMO. 

Because deny edges represent constraints on the context, weakening the deny 
places fewer constraints, allowing more behaviours, so we compare them with 
relational inclusion: 


(A2, G2, D2) Ce (A2, G2, D2) 4& A =A^GC G ADC Dy 


This refinement on extended histories is used to define our refinement relation 
on blocks, Ee, def. (8). 


5.3 Finiteness 


Theorem 4 (FINITENESS). If for a block B and state ø the set of thread-local 
executions (B, ø) is finite, then so is the set of resulting block-local executions, 
{X | 3A, S. X € [B, A, 0, S] A cut(x)}. 


Proof (sketch). It is easy to see for a given thread-local execution there are 
finitely many possible visible reads and writes. Any two non-visible writes must 
be distinguished by at least one visible write, limiting their number. 


Theorem 4 means that any transformation can be checked automatically if 
the two blocks have finite sets of thread-local executions. We assume a finite 
data domain, meaning action can only take finitely many distinct values in Val. 
Recall also that our language does not include loops. Given these facts, any 
transformations written in our language will satisfy finiteness, and can therefore 
by automatically checked. 


6 Prototype Verification Tool 


Stellite is our prototype tool that verifies transformations using the Alloy* model 
checker [12,18]. Our tool takes an input transformation B2 ~ Bı written in a 
C-like syntax. It automatically converts the transformation into an Alloy* model 
encoding Bı E, B2. If the tool reports success, then the transformation is verified 
for unboundedly large syntactic contexts and executions. 

An Alloy model consists of a collection of predicates on relations, and an 
instance of the model is a set of relations that satisfy the predicates. As pre- 
viously noted in [28], there is therefore a natural fit between Alloy models and 
axiomatic memory models. 


At a high level, our tool works as follows: 


1. The two sides of an input transformation Bı and Bz are automatically con- 
verted into Alloy predicates expressing their syntactic structure. Intuitively, 
these block predicates are built by following the thread-local semantics from 
Sect. 3. 
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2. The block predicates are linked with a pre-defined Alloy model expressing the 
memory model and Ce. 

3. The Alloy* solver searches (using SAT) for a history of Bı that has no match- 
ing history of B2. We use the higher-order Alloy* solver of [18] because the 
standard Alloy solver cannot support the existential quantification on histo- 
ries in Ce. 


The Alloy* solver is parameterised by the maximum size of the model it will 
examine. However, our finiteness theorem for Ee. (Theorem 4) means there is a 
bound on the size of cut-down context that needs to be considered to verify any 
given transformation. If our tool reports that a transformation is correct, it is 
verified in all syntactic contexts of unbounded size. 

Given a query Bı Ee B2, the required context bound grows in proportion to 
the number of internal actions on distinct locations in Bı. This is because our 
cutting predicate permits context actions if they interact with internal actions, 
either directly, or by interleaving between internal actions. In our experiments 
we run the tool with a model bound of 10, sufficient to give soundness for all the 
transformations we consider. Note that most of our example transformations do 
not require such a large bound, and execution times improve if it is reduced. 

If a counter-example is discovered, the problematic execution and history can 
be viewed using the Alloy model visualiser, which has a similar appearance to 
the execution diagrams in this paper. The output model generated by our tool 
encodes the history of Bı for which no history of Bə could be found. As Ee is 
not fully abstract, this counter-example could, of course, be spurious. 

Stellite currently supports transformations on code-blocks with atomic reads, 
writes, and fences. It does not yet support code-blocks with non-atomic accesses 
(see Sect. 7), LL-SC, or branching control-flow. We believe supporting the above 
features would not present fundamental difficulties, since the structure of the 
Alloy encoding would be similar. Despite the above limitations, our prototype 
demonstrates that our cut-down denotation can be used for automatic verifica- 
tion of important program transformations. 


Experimental Results. We have tested our tool on a range of different transfor- 
mations. A table of experimental results is given in Fig. 9. Many of our examples 
are derived from [23] — we cover all their examples that fit into our tool’s input 
language. Transformations of the sort that we check have led to real-world bugs 
in GCC [19] and LLVM [8]. Note that some transformations are invalid because 
of their effect on local variables, e.g. skip ~> l := load(x). The closely related 
transformation skip ~> load(x) throws away the result of the read, and is con- 
sequently valid. 

Our tool takes significant time to verify some of the above examples, and two 
of the transformations cause the tool to time out. This is due to the complexity 
and non-determinism of the C11 model. In particular, our execution times are 
comparable to existing C++ model simulators such as Cppmem when they run 
on a few lines of code [3]. However, our tool is a sound transformation verifier, 
rather than a simulator, and thus solves a more difficult problem: transformations 
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Introduction, validity, me) Elimination, validity, ime (s) 
skip ~ fc v| 76 7 ki 15 
c ~ ski x 
skip = 1d(z) v | 429 7 
. l:= ld(x) ~ skip x| 17 
skip ~ l :=1d(z) x| 18 
l := ld(x); st(x,l) ~ l := ld(x)| x | 64 
l:= ld(x) ~ l := ld(x);st(x,l) |x] 72 
l := ld(x); l := ld(x) ~ l := ld(x)| v | 2k 
l:= ld(x) ~ l := ld(y);l := lda(x)| ? | oo 
st(x,l);l := lda(x) ~ st(x,l) |v | 9k 
[:=1d(x) ~ l := ld(x); l := la(x) |v | 20k 
st(x, m); st(x, l) ~~ st(x,l) |v | 24k 
st(x,l) ~> st(x, l); st(x, Ll) x | 136 
fc;fc~ fc v | 382 
fco~fe;fe v | 248 


Exchange, validity, time (s) 
fc; l := lda(x) ~ l := ld(x); fc 
fc; st(x,l) ~ st(x,l); fc 
l := ld(x); fc ~ fc; l :=1d(x) 
st(x,l); fc ~ fc; st(x,l) 
l:= ld(x); st(y, m) ~ st(y, m); l := ld(x) 
m := ld(y); l :=1d(x) ~ l := ld(x);m := ld(y) 
st(y, m); l := 1d(x) ~ l := ld(x); st (y, m) 
st(y, m); st(x,l) ~ st(x,l); st(y, m) 


x ~~ xk xX KX KX XK & 
= 
N 
r 


Fig. 9. Results from executing Stellite on a 32 core 2.3 GHz AMD Opteron, with 128 GB 
RAM, over Linux 3.13.0-88 and Java 1.8.0_91. load/store/fence are abbreviated to 
1d/st/fc. v and x denote whether the transformation satisfies E.. co denotes a timeout 
after 8h. 


are verified for unboundedly large syntactic contexts and executions, rather than 
for a single execution. 


7 Transformations with Non-atomics 


We now extend our approach to non-atomic (i.e. unsynchronised) accesses. C11 
non-atomics are intended to enable sequential compiler optimisations that would 
otherwise be unsound in a concurrent context. To achieve this, any concur- 
rent read-write or write-write pair of non-atomic actions on the same location 
is declared a data race, which causes the whole program to have undefined 
behaviour. Therefore, adding non-atomics impacts not just the model, but also 
our denotation. 


7.1 Memory Model with Non-atomics 


Non-atomic loads and stores are added to the model by introducing new com- 
mands storena(z,/) and l := loadya(«) and the corresponding kinds of actions: 
storeya, loadya € Kind. We let NA be the set of all actions of these kinds. We 
partition global variables so that they are either only accessed by non-atomics, 
or by atomics. We do not permit non-atomic LL-SC operations. Two new valid- 
ity axioms ensure that non-atomics read from writes that happen before them, 
but not from stale writes: 
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store(y,0); storena(z, 1); store(y,0); storena(z, 1); 
storena(x,1); 11 := loadna (x); storena(x,1); 11 := loadna (x); 
store(y,1); 12 := load(y); store(y,1); 13 := loadna (x); 

13 := loadna(x); 12 := load(y); 
store(y, 0) store(y, 0) 
sb, hb sb, hb 
¥ Y 


storena(x, 0) storena(x, 0) 


sb, hb rf sb, hb) rf 
Y Y 
storena(x, 1) loadna(x, 0) storena(x, 1) loadna(x, 0) rf 
x—> 
race 
sb, w| sb, hb sb, | sb, hb 
Y Y 
store(y, 1) load(y, 1) store(y, 1) loadna(x, 0) 
sb, hb sb, hb 
í rf, hb 
loadna(x, 1) load(y, 1) 


Fig. 10. Top left: augmented MP, with non-atomic accesses to x, and a new racy load. 
Top right: the same code optimised with Bz ~~ Bı. Below each: a valid execution. 


- RFHBNA: Vu,r € NA.w br = wr 


= hb hb 
— COHERNA: 74w 1, we,r E NA. wy ——> wo —— =r 
aia I 


rf 


Modification order (mo) does not cover non-atomic accesses, and we change 
the definition of happens-before (hb), so that non-atomic loads do not add edges 
to it: 


— HBDFF: hb = (sb U (rf N {(w, 7) | w,r € NA}))* 


Consider the code on the left in Fig. 10: it is similar to MP from Fig. 1, but 
we have removed the if-statement, made all accesses to x non-atomic, and we 
have added an additional load of x at the start of the right-hand thread. The 
valid execution of this code on the left-hand side demonstrates the additions to 
the model for non-atomics: 


— modification order (mo) relates writes to atomic y, but not non-atomic x; 

— the first load of x is forced to read from the initialisation by RFHBNA; and 

— the second read of x is forced to read 1 because the hb created by the load of 
y obscures the now-stale initialisation write, in accordance with COHERNA. 


The most significant change to the model is the introduction of a safety 
axiom, data-race freedom (DRF). This forbids non-atomic read-write and write- 
write pairs that are unordered in hb: 
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y Hr Jr. u # v Au = (store(x, -)) A ey gy 
ea tise (| € {(load(ar, Ryne => | vuvgNA 


We write safe(X) if an execution satisfies this axiom. Returning to the left 
of Fig. 10, we see that there is a violation of DRF — a race on non-atomics — 
between the first load of x and the store of x on the left-hand thread. 

Let | P]NA be defined same way as [P] is in Sect. 3, def. (3), but with adding 
the axioms RFHBNA and COHERNA and substituting the changed axiom 
HBpeEF. Then the semantics [P] of a program with non-atomics is: 


[P] 2 if VX €[PINA.safe(X) then [PJNA else T 


The undefined behaviour T subsumes all others, so any program observa- 
tionally refines a racy program. Hence we modify our notion of observational 
refinement on whole programs: 

Pi <A P, <5 (safe(P,) => (safe(P,) A Pi <pr P2)) 


pr 


This always holds when P» is unsafe; otherwise, it requires P} to preserve safety 
and observations to match. We define observational refinement on blocks, <}\*, 
by lifting N^ as per Sect. 2, def. (2). 


Spr 


7.2 Denotation with Non-atomics 


We now define our denotation for non-atomics, ehga building on the ‘quantified’ 
denotation E, defined in Sect. 4. (We have also defined a finite variant of this 
denotation using the cutting strategy described in Sect. 5 — we leave this to [10, 
Sect. C].) 

Non-atomic actions do not participate in happens-before (hb) or coherence 
order (mo). For this reason, we need not change the structure of the history. 
However, non-atomics introduce undefined behaviour T, which is a special kind 
of observable behaviour. If a block races with its context in some execution, 
the whole program becomes unsafe, for all executions. Therefore, our denotation 
must identify how a block may race with its context. In particular, for the deno- 
tation to be adequate, for any context C and two blocks B, E}* Bz, we must 
have that if C(B1) is racy, then C(Bz2) is also racy. 

To motivate the precise definition of CEÑA, we consider the following (sound) 
‘anti-roach-motel’ transformation‘, noting that it might be applied to the right- 
hand thread of the code in the left of Fig. 10: 


Bg: 11 := loadya(x); 12 := load(y); 13 := loadya(x) 
~> By: 11 := loadna(x); 13 := loadya(x); 12 := load(y) 


t This example was provided to us by Lahav, Giannarakis and Vafeiadis in personal 
communication. 
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In a standard roach-motel transformation [25], operations are moved into a 
synchronised block. This is sound because it only introduces new happens-before 
ordering between events, thereby restricting the execution of the program and 
preserving data-race freedom. In the above transformation, the second NA load 
of x is moved past the atomic load of y, effectively out of the synchronised block, 
reducing happens-before ordering, and possibly introducing new races. However, 
this is sound, because any data-race generated by Bı must have already occurred 
with the first NA load of x, matching a racy execution of Bə. Verifying this 
transformation requires that we reason about races, so ENA must account for 
both racy and non-racy behaviour. 

The code on the left of Fig. 10 represents a context, composed with B2, and 
the execution of Fig. 10 demonstrates that together they are racy. If we were 
to apply our transformation to the fragment Bə of the right-hand thread, then 
we would produce the code on the right in Fig. 10. On the right in Fig. 10, we 
present a similar execution to the one given on the left. The reordering on the 
right-hand thread has led to the second load of x taking the value 0 rather than 
1, in accordance with RFHBNA. Note that the execution still has a race on the 
first load of x, albeit with different following events. As this example illustrates, 
when considering racy executions in the definition of Ey we may need to match 
executions of the two code-blocks that behave differently after a race. This is 
the key subtlety in our definition of ees 

In more detail, for two related blocks Bı ae Bo, if Bz generates a race in a 
block-local execution under a given (reduced) context, then we require Bı and 
Bə to have corresponding histories only up to the point the race occurs. Once the 
race has occurred, the following behaviours of Bı and Bz may differ. This still 
ensures adequacy: when the blocks Bı and Bz are embedded into a syntactic 
context C, this ensures that a race can be reproduced in C(B2), and hence, 
C(Bı) s C(Bə). 

By default, C11 executions represent a program’s complete behaviour to 
termination. To allow us to compare executions up to the point a race occurs, 
we use prefires of executions. We therefore introduce the downclosure X!, the 
set of (hb U rf)*-prefixes of an execution X: 


X! ê {X'|JA.X' = X|[4 AV(u,v) € (hb(X) Urf(X))t+. (ve A> uE A} 


Here X|,4 is the projection of the execution X to actions in A. We lift the 
downclosure to sets of executions in the standard way. 
Now we define our refinement relation Bı Ee Bg as follows: 


B, CNA By 46 VA,R,S.VX1 € [Bi, A, R, SIA. 3X2 € [Bo, A, R, SIN. 
(safe(X2) ==> safe(X1) A hist(X 1) En hist(X2)) A 
(rsafe(X_) => 3X; € (Xo)! 3X] € (X%)!. 
asafe(X$) A hist( X1) En hist(X4)) 
In this definition, for each execution X; of block B1, we witness an execution 


Xə of block Bə that is related. The relationship depends on whether Xə is safe 
or unsafe. 


1050 M. Dodds et al. 


Execution Xı Execution X2 History 
race pons Call -------- ; parais call --------; 
storey a(X,1) 4T sb, hb } i SOE NAC) e sb, hb } : 
R, hb : N PR loadya(x%,0) R hb: load) a(x,0) 
y NE i sb, hb ; v i sb, hb yf storen) q(x,1) 
store(y,1) `$ loadya(x,0) store) T ~ load(y,1) ia mo call == 
| sb, hb ý i : sb, hb : : 
rf, hb : l i : store(y,1) 
load(y,1) | : loadya (x-1) : a 
i sb, hb i Í sb, hb} as 
segas ret Heroi =e TA fet e 


Fig. 11. History comparison for an NA-based program transformation 


— If Xə is safe, then the situation corresponds to Eq — see Sect. 4, def. (7). In 
fact, if B2 is certain to be safe, for example because it has no non-atomic 
accesses, then the above definition is equivalent to Eg. 

— If Xə is unsafe then it has a race, and we do not have to relate the whole 
executions X, and X2. We need only show that the race in Xə is feasible 
by finding a prefix in Xı that refines the prefix leading to the race in X2. In 
other words, Xə will behave consistently with X; until it becomes unsafe. This 
ensures that the race in Xə will in fact occur, and its undefined behaviour will 
subsume the behaviour of Bı. After X2 becomes unsafe, the two blocks can 
behave entirely differently, so we need not show that the complete histories 
of Xı and Xə are related. 


Recall the transformation B2 ~~ Bı given above. To verify it, we must estab- 
lish that Bı er By. As before, we illustrate the reasoning for a single block-local 
execution — verifying the transformation would require a proof for all block-local 
executions. 

In Fig. 11 we give an execution X; € [Bi,A, R, S], with a context action set 
A consisting of a non-atomic store of x = 1 and an atomic store of y = 1, anda 
context relation R relating the store of x to the store of y. Note that this choice 
of context actions matches the left-hand thread in the code listings of Fig. 10, 
and there are data races between the loads and the store on x. 

To prove the refinement for this execution, we exhibit a corresponding unsafe 
execution X2 € |B2, A, R, Slv. The histories of the complete executions X, and 
Xə differ in their return action. In Xə the load of y takes the value of the context 
store, so COHERNA forces the second load of x to read from the context store of 
x. This changes the values of local variables recorded in ret’. However, because 
Xə is unsafe, we can select a prefix X4 which includes the race (we denote in 
grey the parts that we do not include). Similarly, we can select a prefix Xj} of 
X 1. We have that hist(X{) = hist(X4) (shown in the figure), even though the 
histories hist(X1) and hist( X2) do not correspond. 


Theorem 5 (ADEQUACY oF CNA). Bı Ee B> = Bı 3NA Bo. 


Theorem 6 (FULL ABSTRACTION OF CNA), By N^ B2 > By CNA Bo. 
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We prove Theorem 5 in [10, Sect. B] and Theorem 6 in [10, Sect. F]. Note that the 
prefixing in our definition of ae is required for full abstraction—but it would 
be adequate to always require complete executions with related histories. 


8 Full Abstraction 


The key idea of our proofs of full abstraction (Theorems 2 and 6, given in full 
in [10, Sect. F]) is to construct a special syntactic context that is sensitive to 
one particular history. Namely, given an execution X produced from a block B 
with context happens-before R, this context Cx guarantees: (1) that X is the 
block portion of an execution of Cx (B); and (2) for any block B’, if Cx (B’) 
has a different block history from X, then this is visible in different observable 
behaviour. Therefore for any blocks that are distinguished by different histories, 
Cx can produce a program with different observable behaviour, establishing full 
abstraction. 


Special Context Construction. The precise definition of the special context con- 
struction Cx is given in [10, Sect. F] — here we sketch its behaviour. Cx executes 
the context operations from X in parallel with the block. It wraps these oper- 
ations in auxiliary wrapper code to enforce context happens-before, R, and to 
check the history. If wrapper code fails, it writes to an error variable, which 
thereby alters the observable behaviour. 

The context must generate edges in R. This is enforced by wrappers that use 
watchdog variables to create hb-edges: each edge (u,v) € R is replicated by a 
write and read on variable h,,,). If the read on hi,,,) does not read the write, 
then the error variable is written. The shape of a successful read is given on the 
left in Fig. 12. 


u write(gu,v, 1) 
son | com m sne ai rf,G 
write(hu,v, 1) ae read(hu,v, 1) u e a v 
“| sb,hb he [ome 
a l P 
v read(gu,u, 1) 


Fig. 12. The execution shapes generated by the special context for, on the left, gener- 
ation of R, and on the right, errant history edges. 


The context must also prohibit history edges beyond those in the original 
guarantee G, and again it uses watchdog variables. For each (u,v) not in G, 
the special context writes to watchdog variable g(,,,) before u and a reads g(u,v) 
after v. If the read of g(u,„) does read the value written before u, then there is an 
errant history edge, and the error location is written. An erroneous execution has 
the shape given on the right in Fig. 12 (omitting the write to the error location). 
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Full Abstraction and LLI-SC. Our proof of full abstraction for the language 
with C11 non-atomics requires the language to also include LL-SC, not just 
C11’s standard CAS: the former operation increases the observational power of 
the context. However, without non-atomics (Sect.4) CAS would be sufficient to 
prove full abstraction. 


9 Related Work 


Our approach builds on our prior work [3], which generalises linearizability [11] to 
the C11 memory model. This work represented interactions between a library and 
its clients by sets of histories consisting of a guarantee and a deny; we do the same 
for code-block and context. However, our previous work assumed information 
hiding, i.e., that the variables used by the library cannot be directly accessed by 
clients; we lift this assumption here. We also establish both adequacy and full 
abstraction, propose a finite denotation, and build an automated verification 
tool. 

Our approach is similar in structure to the seminal concurrency semantics of 
Brookes [6]: i.e. a code block is represented by a denotation capturing possible 
interactions with an abstracted context. In [6], denotations are sets of traces, 
consisting of sequences of global program states; context actions are represented 
by changes in these states. To handle the more complex axiomatic memory 
model, our denotation consists of sets of context actions and relations on them, 
with context actions explicitly represented as such. Also, in order to achieve 
full abstraction, Brookes assumes a powerful atomic await () instruction which 
blocks until the global state satisfies a predicate. Our result does not require this: 
all our instructions operate on single locations, and our strongest instruction is 
LL-SC, which is commonly available on hardware. 

Brookes-like approaches have been applied to several relaxed models: opera- 
tional hardware models [7], TSO [13], and SC-DRF [21]. Also, [7,21] define tools 
for verifying program transformations. All three approaches are based on traces 
rather than partial orders, and are therefore not directly portable to C11-style 
axiomatic memory models. All three also target substantially stronger (i.e. more 
restrictive) models. 

Methods for verifying code transformations, either manually or using proof 
assistants, have been proposed for several relaxed models: TSO [24,26,27], 
Java [25] and C/C++ [23]. These methods are non-compositional in the sense 
that verifying a transformation requires considering the trace set of the entire 
program—there is no abstraction of the context. We abstract both the sequen- 
tial and concurrent context and thereby support automated verification. The 
above methods also model transformations as rewrites on program executions, 
whereas we treat them directly as modifications of program syntax; the latter 
corresponds more closely to actual compilers. Finally, these methods all require 
considerable proof effort; we build an automated verification tool. 

Our tool is a sound verification tool — that is, transformations are verified for 
all context and all executions of unbounded size. Several tools exist for testing 
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(not verifying) program transformations on axiomatic memory models by search- 
ing for counter-examples to correctness, e.g., [16] for GCC and [8] for LLVM. 
Alloy was used by [28] in a testing tool for comparing memory models — this 
includes comparing language-level constructs with their compiled forms. 


10 Conclusions 


We have proposed the first fully abstract denotational semantics for an axiomatic 
relaxed memory model, and using this, we have built the first tool capable of 
automatically verifying program transformation on such a model. Our theory 
lays the groundwork for further research into the properties of axiomatic models. 
In particular, our definition of the denotation as a set of histories and our context 
reduction should be portable to other axiomatic models based on happens-before, 
such as those for hardware [1]. 
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